this post was submitted on 29 Jan 2025
974 points (98.5% liked)

Technology

61442 readers
3865 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

The narrative that OpenAI, Microsoft, and freshly minted White House “AI czar” David Sacks are now pushing to explain why DeepSeek was able to create a large language model that outpaces OpenAI’s while spending orders of magnitude less money and using older chips is that DeepSeek used OpenAI’s data unfairly and without compensation. Sound familiar?

Both Bloomberg and the Financial Times are reporting that Microsoft and OpenAI have been probing whether DeepSeek improperly trained the R1 model that is taking the AI world by storm on the outputs of OpenAI models.

It is, as many have already pointed out, incredibly ironic that OpenAI, a company that has been obtaining large amounts of data from all of humankind largely in an “unauthorized manner,” and, in some cases, in violation of the terms of service of those from whom they have been taking from, is now complaining about the very practices by which it has built its company.

OpenAI is currently being sued by the New York Times for training on its articles, and its argument is that this is perfectly fine under copyright law fair use protections.

“Training AI models using publicly available internet materials is fair use, as supported by long-standing and widely accepted precedents. We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness,” OpenAI wrote in a blog post. In its motion to dismiss in court, OpenAI wrote “it has long been clear that the non-consumptive use of copyrighted material (like large language model training) is protected by fair use.”

OpenAI argues that it is legal for the company to train on whatever it wants for whatever reason it wants, then it stands to reason that it doesn’t have much of a leg to stand on when competitors use common strategies used in the world of machine learning to make their own models.

(page 2) 50 comments
sorted by: hot top controversial new old
[–] mechoman444@lemmy.world 0 points 6 days ago (8 children)

I can't believe we're still on this nonsense about AI stealing data for training.

I've had this argument so many times before y'all need to figure out which data you want free and which data do you want to pay for because you can't have it both ways.

Either the data is free or it's paid for. For everyone including individuals and corporations.

You can't have data be free for some people and be paid for for others it doesn't work that way we don't have the infrastructure to support this kind of thing.

For example Wikipedia can't make its data available for AI training for a price and free for everyone else. You can just go to wikipedia.com and read all the data that you want. It's available for free there's no paywall there's no subscriptions no account to make no password to put in no username to think of.

Either all data is free or it's all paid for.

[–] LengAwaits@lemmy.world 2 points 6 days ago* (last edited 6 days ago)

I tend to think that information should be free, generally, so I would probably be fine with "OpenAI the non-profit" taking copyrighted data under fair-use, but I don't extend that thinking to "OpenAI the for-profit company".

load more comments (7 replies)
[–] nightwatch_admin@feddit.nl 272 points 1 week ago (20 children)

It is effing hilarious. First, OpenAI & friends steal creative works to “train” their LLMs. Then they are insanely hyped for what amounts to glorified statistics, get “valued” at insane amounts while burning money faster than a Californian forest fire. Then, a competitor appears that has the same evil energy but slightly better statistics.. bam. A trillion of “value” just evaporates as if it never existed.
And then suddenly people are complaining that DeepSuck is “not privacy friendly” and stealing from OpenAI. Hahaha. Fuck this timeline.

[–] Sanctus@lemmy.world 85 points 1 week ago (1 children)

It never did exist. This is the problem with the stock market.

[–] Ulrich@feddit.org 45 points 1 week ago (20 children)

That's why "value" is in quotes. It's not that it didn't exist, is just that it's purely speculative.

Hell Nvidia's stock plummeted as well, which makes no sense at all, considering Deepseek needs the same hardware as ChatGPT.

Stock investing is just gambling on whatever is public opinion, which is notoriously difficult because people are largely dumb and irrational.

[–] Alph4d0g@discuss.tchncs.de 1 points 6 days ago

"valuation" I suppose. The "value" that we project onto something whether that something has truly earned it.

load more comments (19 replies)
[–] teft@lemmy.world 20 points 1 week ago (3 children)

I hear tulip bulbs are a good investment...

[–] criss_cross@lemmy.world 0 points 6 days ago* (last edited 6 days ago)

Nah bitcoin is the future

Edit: /s I was trying to say bitcoin = tulips

load more comments (2 replies)
[–] boredtortoise@lemm.ee 19 points 1 week ago

Capitalism basics, competition of exploitation

load more comments (17 replies)
[–] Sgt_choke_n_stroke@lemmy.world 208 points 1 week ago
[–] owenfromcanada@lemmy.world 100 points 1 week ago (4 children)
[–] x00z@lemmy.world 2 points 6 days ago

Tamaleeeeeeeeesssssss

hot hot hot hot tamaleeeeeeeees

[–] MysticKetchup@lemmy.world 41 points 1 week ago (2 children)

I feel like I didn't appreciate this movie enough when I first watched it but it only gets better as I get older

[–] ouRKaoS@lemmy.today 13 points 1 week ago

"Now" is always a good time to rewatch it & get more out of it!

[–] just_another_person@lemmy.world 12 points 1 week ago (3 children)

It's a true comedy that still holds up. I honestly thought for years that Mel Brooks had something to do with it, but he didn't. It's so well crafted that there are many layers to it that you can't even grasp when watching as a child. Seeing it as an adult just open your eyes to how amazingly well done it was.

I could do without the whole Billy Crystalizing of large portions of it though.

load more comments (3 replies)
load more comments (2 replies)
[–] BertramDitore@lemm.ee 68 points 1 week ago (3 children)

Corporate media take note. This is how you do reality-based reporting. None of the both-sides bullshit trying to justify or make excuses, just laughing in the face of absurd hypocrisy. This is a well-respected journalist confronting a truth we can all plainly see. See? The truth doesn’t need to be boring or bland or “balanced” by disingenuous attempts to see the other side.

I will explain what this means in a moment, but first: Hahahahahahahahahahahahahahahaha hahahhahahahahahahahahahahaha. It is, as many have already pointed out, incredibly ironic that OpenAI, a company that has been obtaining large amounts of data from all of humankind largely in an “unauthorized manner,” and, in some cases, in violation of the terms of service of those from whom they have been taking from, is now complaining about the very practices by which it has built its company.

load more comments (3 replies)
[–] humble_pete_digger@lemm.ee 30 points 1 week ago (2 children)

Thank you China.
No for real - it's either EU or frigging china that helps us with these oligarch overlords

load more comments (2 replies)
[–] dogslayeggs@lemmy.world 21 points 1 week ago

Regardless of how OpenAI procured their data, I'm absolutely shocked that a company from China would obtain data unauthorized from a company in another country.

[–] NeoNachtwaechter@lemmy.world 13 points 1 week ago

explain why DeepSeek was able to create

Surely they also received tons of plutonium donations from Iran!

/s

load more comments
view more: ‹ prev next ›