this post was submitted on 22 Jan 2026
503 points (98.8% liked)

Technology

79015 readers
2930 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] SparrowHawk@feddit.it 2 points 56 minutes ago

I dont know why but this is all so funny and ridicolous to me.

Infuriating too, but so ridicolous. Like, capitalism is proving how much it sucks for it to need to go against its own rules. Like it always did this but now it is so pathetically clear.

[–] SabinStargem@lemmy.today 4 points 1 hour ago (1 children)

I support the destruction of copyright. Humanity should have free access to media, be it for enhancing their commercial products or for individuals to develop their personhood.

[–] neuromorph@lemmy.world 2 points 49 minutes ago

We need to remove any copyright from whatever is developed by the AI companies.

If the AI can use copyrighted material without compensating the owners, then it should be free for everyone to use/own the content AI creates

[–] sureshot0@discuss.online 11 points 6 hours ago

It would be so funny if this ended with Nvidia getting robbed.

[–] PierceTheBubble@lemmy.ml 6 points 14 hours ago* (last edited 11 hours ago) (1 children)

So the amend alleges, Nvidia having used/stored/copied/obtained/distributed copyrighted works (including plaintiffs'), both through databases available on Hugging Face ('Books3' featured in both 'The Pile' and 'SlimPajama'), or pirating from shadow libraries (like Anna's Archive), to train multiple LLMs (primarily their 'NeMo Megatron' series), and distributing the copyrighted data through the 'NeMo Megatron Framework'; data which was ultimately sourced from shadow libraries.

It's quite an interesting read actually, especially the link to this Anna's Archive blog post. Which it grossly pulls out of context, as plaintiffs clearly despise the shadow libraries too: as they have ultimately provided access to their copyrighted material.

Especially the part: "Most (but not all!) US-based companies reconsidered once they realized the illegal nature of our work. By contrast, Chinese firms have enthusiastically embraced our collection, apparently untroubled by its legality." makes me wonder if that's the reason why models like Deepseek, initially blew Western models out of the water.

[–] Knock_Knock_Lemmy_In@lemmy.world 2 points 5 hours ago (1 children)

You can ask deepseek detailed questions about Harry Potter books and it responds intelligently with (almost) quotes from the book.

Ask chatGPT and it will respond to questions but denys it has read any book.

[–] Corkyskog@sh.itjust.works 2 points 1 hour ago

Interesting, I was using Deepseek for book recommendations and it was exceptionally good at recommending books that are similar to one I just read compared to other models.

[–] theunknownmuncher@lemmy.world 194 points 1 day ago (2 children)

Allegedly most valuable company on the planet in all of history (can't afford books). Allegedly not a bubble or fraud.

[–] MrScottyTay@sh.itjust.works 36 points 1 day ago* (last edited 1 day ago) (3 children)

Sadly I think it's more that there isn't really a standard way to buy books and other media in bulk at the scale of which AI training usually requires. So the companies realise they can save both time and money in just pirating after calculating the fine risk. Its just a bonus that they usually get away with it and that the fines would likely be cheaper than a legit transaction. But i do think it's the bulk data packaging that makes piracy look more attractive to them at the get-go.

Heck, even video game publishers often source their roms for their official re-releases from pirated copies because pirates are better at preserving data and keeping it in a nice friendly format. Easier to search for it on the web and download it then it is too goo into their own archives and rip it themselves, if they even still have original copies, cause they sure as hell didn't keep their source code.

[–] amzd@lemmy.world 16 points 1 day ago

There is also no standard way of buying a DRM free epub for personal use so I’m fine downloading them from Anna too :)

[–] theunknownmuncher@lemmy.world 6 points 1 day ago (1 children)

Yeah, no, this genuinely doesn't make sense as there are legitimate repositories for these books and can do business-to-business negotiations for access to them. Even libraries have access to ebooks at bulk scale.

[–] MrScottyTay@sh.itjust.works -1 points 7 hours ago (1 children)

Those kinds of negotiations if they haven't been done by other companies before, they won't have a process for it already in place. There'd be lots of friction for the first of such deal. Both in lots of legal work and software development to make sure they only get access relevant to the deal made.

It's not something they can just be like "hey, here's the FTP URI". Because these legitimate repositories you speak of, like Amazon I guess, will already have existing deals with publishers. Currently as they stand, these deals may not be compatible with Amazon sharing their IP with other companies. So they will either have to redo those deals or restrict access of specific titles to the likes of Nvidia.

[–] theunknownmuncher@lemmy.world 1 points 2 hours ago* (last edited 1 hour ago) (1 children)

Ah yes, of course, the legal challenges of selling a copy of a book that is literally for sale 🙄🙄🙄

Yeah the existing deal with publishers is "sell my book", dummy. And no, there is no real software development work because you have genuinely no idea what you're talking about if you think it's not already just sitting in an S3 bucket with a database mapping it by those different publishers and deals. Again, even libraries have a database system that could handle this

How do you think it works when an individual buys one book? A lawyer and software developer sit down to figure out the terms and conditions and how to technically find that book in their computer system?

[–] MrScottyTay@sh.itjust.works 1 points 1 hour ago (1 children)

The development work I mentioned it you actually read it was about ensuring that specific access is given at the scale in which they need.

Plus the legal challenge is not about the singular copies of books but for it to be in a state that is suitable for the ingestion of data which would likely mean giving them specifically DRM free versions which I imagine some book publishers would scowl at.

[–] theunknownmuncher@lemmy.world 1 points 1 hour ago

Move those goalposts! Yeah I guess they're only option is to pirate the books then, it's not like NVIDIA has access to OCR or anything 🙄

[–] Waphles@lemmy.world 2 points 1 day ago (1 children)

Well, I suppose they could buy access to Amazon’s kindle servers

Hmmm. I wonder what Amazons LLMs are trained on.

Are you suggesting that there is a use case for piracy that has less to do with saving money than it does with convenience and easy access to media in one place?

[–] null@piefed.nullspace.lol 17 points 1 day ago (1 children)
[–] 0x0@lemmy.zip 10 points 21 hours ago

Not if it's the rich guys doing it.

[–] rafoix@lemmy.zip 70 points 1 day ago (8 children)

Will they be sued per book?

[–] UnspecificGravity@piefed.social 28 points 1 day ago (2 children)

It's not stealing when corpos do it.

Meta torrented their training data from the pirate bay. Hell, Spotify initially built their catalog from pirated music. They all do this shit. Corporations are built to steal our shit and sell it back to us. This isn't any different from pumping oil out of pubic lands and selling it back to us.

[–] demonsword@lemmy.world 2 points 22 hours ago

pumping oil out of pubic lands

this sounds really painful lol

[–] ICastFist@programming.dev 2 points 1 day ago

wish meta had torrented all the viruses, too, would be fun to read the news of "facebook and instagram are offline as meta suffers from cyberattack"

[–] Goodlucksil@lemmy.dbzer0.com 11 points 1 day ago

No becaese the lawyer cohort will destroy them.

load more comments (6 replies)
[–] Appoxo@lemmy.dbzer0.com 18 points 1 day ago (2 children)

But...why?
Just torrent it?

[–] brokenwing@discuss.tchncs.de 14 points 1 day ago (2 children)

AA might be digging their own grave. Overtime the knowledge gets accumulated in the hands of a select few and then they're gonna block people from accessing pirated sites like AA or even worse, AA gets shutdown due to lack of traffic.

[–] Dadifer@lemmy.world 6 points 21 hours ago

It has torrent backup. How would it do either of those things?

[–] Cherry@piefed.social 4 points 1 day ago

It's a really good thought. IMO what they will be producing with AI wont be knowledge it will be slop.

There is always gonna be an indie writer, a local at the pub singing. They cant stop people creating. Download or buy analog of the stuff you like and store it. We don’t have to be a slave to the mainstream dream...i will say though its hard changing habits...but for me, it starts with me.

[–] FaceDeer@fedia.io 40 points 1 day ago (1 children)

Seems strange. Anna's Archive makes their collection available for bulk download as torrent files, they shouldn't need to "cut a deal" for access to that. Just download the torrent and now you've got the whole collection available locally.

[–] nialv7@lemmy.world 30 points 1 day ago* (last edited 1 day ago) (2 children)

They do provide direct access to their books for business who are willing to pay.

https://annas-archive.li/llm

[–] dukemirage@lemmy.world 23 points 1 day ago

chaotic neutral

[–] FaceDeer@fedia.io 15 points 1 day ago (1 children)

Which, as I said, seems strange. Why don't those businesses just download the torrents?

[–] imecth@fedia.io 28 points 1 day ago (1 children)
[–] FaceDeer@fedia.io 24 points 1 day ago (1 children)

Ah, low numbers of seeds. Must've just not wanted to wait.

[–] psx_crab@lemmy.zip 12 points 1 day ago

Fucking hit and run lmao

[–] scytale@piefed.zip 42 points 1 day ago

Holy shit the greed knows no bounds.

[–] flowers_galore2@lemmynsfw.com 14 points 1 day ago

Hmm so nvidia is training llms as well. Are they going to compete with their customers now too? Like anthropic and cursor?

Good. Can’t wait for the bubble to pop.

[–] DandomRude@lemmy.world 12 points 1 day ago (2 children)

So we can assume that in the future, only slob written by LLMs will be available. I mean, who would be willing to spend hundreds of hours writing a book when even huge corporations that earn billions from it won't pay the author a single dime?

[–] Cherry@piefed.social 5 points 1 day ago* (last edited 1 day ago)

The trick is not to pay a dime to read it. Even producing Ai slop has a cost. If no one pays for that it must leave a negative.

Stop buying. Or If you have to buy old stuff second hand. There’s already a surplus.

Alternatively piracy is clearly condoned here so again don’t buy.

[–] dukemirage@lemmy.world 4 points 1 day ago (1 children)

Why should this development stop at books? There are already generated books available, mostly children’s books (no one’s thinking about them now).

load more comments (1 replies)

Allegedly, but holy shit if true. Hard to explain yourself out of that one.

load more comments
view more: next ›