70
top 17 comments
sorted by: hot top controversial new old
[-] Thordros@hexbear.net 35 points 7 months ago

Q: "So, what data did you use to train your model?"

A: "I am sorry, my capability to answer this question is limited, as I am an AI language model. I am not privy to the inner workings of private organizations, and can only answer general questions."

[-] junebug2@hexbear.net 29 points 7 months ago* (last edited 7 months ago)

I feel like it’s not that she doesn’t know the answers, it’s that the answers are not politically convenient. My understanding of the situation is that in the course of training the various GPTs, OpenAI and Microsoft have realistically scanned every piece of text and imagery that’s available on the internet. It didn’t matter how good or bad or who made it, the models needed every available data point. That was all well and good until covid led to a tightening of interest rates, which meant the VC overlords of Silicon Valley finally had to pay a bill. All the vapor ware companies that have never turned a profit are scrambling now, and we see the mass layoffs of the last three years. Microsoft, however, got to be King Shit of VC Mountain because one of their startups invented “AI”. Say what you will about it (and I will), the public interest in and corporate adoption of AI has meant that there is a positive revenue for a tech company. Now regardless of rationality, all tech executives must find a way to cash in on the Golden Calf. Some companies are designing new applications or creating new services. The majority are realizing that they some how, sort of kind of, are the original data the models were made from, and they’re trying to extract rents from it. For now, that’s really only for content in the future. If the CTO here publicly claims that their product relies on YouTube or anything, Alphabet or whatever parent would be stupid not to come and sue for whatever they might get.

[-] SSJ2Marx@hexbear.net 22 points 7 months ago

The way she keeps repeating "publicly available and licensed data" makes me one hundred percent positive that this is a lawyer-written phrase and she knows that she's in a potential legal minefield and is sticking as closely to it as she can.

[-] hello_hello@hexbear.net 15 points 7 months ago

Please give answer I have twitter blocked on my DNS.

[-] henfredemars@infosec.pub 19 points 7 months ago* (last edited 7 months ago)

My quick lazy manual transcription:

What data was used to train Sora?
We used publicly available data and licensed data.

So, videos on YouTube?
I'm actually not sure about that.

OK, videos from Facebook? Instagram?
You know if they were publicly available, um yeah, publicly available to use there might be the data but I'm not sure. I'm not confident about it.

What about Shutterstock? I know you guys have a deal with them.
I'm just not gonna go into the details of the data that was used but it was publicly available or licensed data.

EDIT: Please help, can't figure out how preserve line breaks. Edit: Improved it a bit.

[-] InevitableSwing@hexbear.net 10 points 7 months ago

Two spaces on the end.

---

Yada yada verse
Yada yada verse
Yada yada verse

Yada yada chorus
Yada yada chorus
Yada yada chorus

[-] henfredemars@infosec.pub 5 points 7 months ago

Thank you! I'm not sure how or why that works.

[-] davel@hexbear.net 7 points 7 months ago

Lemmy’s markup language is based on the CommonMark spec.

6.7 Hard line breaks

A line ending (not in a code span or HTML tag) that is preceded by two or more spaces and does not occur at the end of a block is parsed as a hard line break

[-] InevitableSwing@hexbear.net 6 points 7 months ago

They copied what reddit uses. As for why reddit does it that way - I have no idea.

[-] malijaffri@lemmy.dbzer0.com 9 points 7 months ago

It's the standard Markdown implementation

[-] SSJ2Marx@hexbear.net 7 points 7 months ago

soviet-hmm Would it be possible for the text in the box you type in to just... appear in the post exactly as you typed it?

[-] malijaffri@lemmy.dbzer0.com 3 points 7 months ago* (last edited 7 months ago)

You could wrap it in backticks:

text
      exactly
   as       typed

Without the backticks, it becomes:

text exactly as typed

Edit: backticks:

```
text
      exactly
   as       typed
```
[-] flan@hexbear.net 11 points 7 months ago
[-] half_giraffe@hexbear.net 13 points 7 months ago

I thought it was like a meme but it's literally a still from the interview. That's the CTO of OpenAI saying she's not sure what data was used to train the models lmao.

[-] flan@hexbear.net 11 points 7 months ago

OpenAI are data goblins so the lawyers probably don't allow anyone in the C suite to know where the data comes from

[-] dat_math@hexbear.net 9 points 7 months ago

OpenAI are data goblins

openAI: I'm gobblin' data heeah AyyyyyOC-big

[-] blobjim@hexbear.net 5 points 7 months ago

ChatGPT is going GOBLIN MODE

this post was submitted on 15 Mar 2024
70 points (100.0% liked)

chapotraphouse

13517 readers
1070 users here now

Banned? DM Wmill to appeal.

No anti-nautilism posts. See: Eco-fascism Primer

Gossip posts go in c/gossip. Don't post low-hanging fruit here after it gets removed from c/gossip

founded 3 years ago
MODERATORS