this post was submitted on 24 Jun 2025
605 points (98.9% liked)

Technology

71890 readers
4818 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] Fizz@lemmy.nz 12 points 9 hours ago

Judge,I'm pirating them to train ai not to consume for my own personal use.

[–] Randomgal@lemmy.ca 27 points 13 hours ago

You're poor? Fuck you you have to pay to breathe.

Millionaire? Whatever you want daddy uwu

[–] DFX4509B_2@lemmy.org 8 points 10 hours ago* (last edited 10 hours ago) (2 children)

Good luck breaking down people's doors for scanning their own physical books for their personal use when analog media has no DRM and can't phone home, and paper books are an analog medium.

That would be like kicking down people's doors for needle-dropping their LPs to FLAC for their own use and to preserve the physical records as vinyl wears down every time it's played back.

It sounds like transferring an owned print book to digital and using it to train AI was deemed permissable. But downloading a book from the Internet and using it was training data is not allowed, even if you later purchase the pirated book. So, no one will be knocking down your door for scanning your books.

This does raise an interesting case where libraries could end up training and distributing public domain AI models.

[–] booly@sh.itjust.works 1 points 7 hours ago (1 children)

The ruling explicitly says that scanning books and keeping/using those digital copies is legal.

The piracy found to be illegal was downloading unauthorized copies of books from the internet for free.

[–] deltapi@lemmy.world 1 points 5 hours ago

I wonder if the archive.org cases had any bearing on the decision.

[–] MTK@lemmy.world 17 points 13 hours ago (1 children)

Check out my new site TheAIBay, you search for content and an LLM that was trained on reproducing it gives it to you, a small hash check is used to validate accuracy. It is now legal.

[–] nodiratime@lemmy.world 4 points 13 hours ago* (last edited 13 hours ago) (2 children)

Does it "generate" a 1:1 copy?

[–] MTK@lemmy.world 1 points 4 hours ago

You can train an LLM to generate 1:1 copies

[–] y0kai@lemmy.dbzer0.com 12 points 15 hours ago (1 children)

Sure, if your purchase your training material, it's not a copyright infringement to read it.

We needed a judge for this?

[–] excral@feddit.org 12 points 14 hours ago

Yes, because just because you bought a book you don't own its content. You're not allowed to print and/or sell additional copies or publicly post the entire text. Generally it's difficult to say where the limit is of what's allowed. Citing a single sentence in a public posting is most likely fine, citing an entire paragraph is probably fine, too, but an entire chapter would probably be pushing it too far. And when in doubt a judge must decide how far you can go before infringing copyright. There are good arguments to be made that just buying a book doesn't grant the right to train commercial AI models with it.

[–] SaharaMaleikuhm@feddit.org 33 points 21 hours ago (3 children)

But I thought they admitted to torrenting terabytes of ebooks?

[–] FaceDeer@fedia.io 14 points 16 hours ago

That part is not what this preliminary jugement is about. The torrenting part is going to go to an actual trial. This part was about the Authors' claim that the act of training AI itself violated copyright, and this is what the judge has found to be incorrect.

[–] antonim@lemmy.dbzer0.com 12 points 20 hours ago

Facebook (Meta) torrented TBs from Libgen, and their internal chats leaked so we know about that, and IIRC they've been sued. Maybe you're thinking of that case?

load more comments (1 replies)
[–] yournamehere@lemm.ee 8 points 17 hours ago (5 children)

i will train my jailbroken kindle too...display and storage training... i'll just libgen them...no worries...it is not piracy

[–] minorkeys@lemmy.world 4 points 15 hours ago* (last edited 15 hours ago)

Of course we have to have a way to manually check the training data, in detail, as well. Not reading the book, im just verifying training data.

load more comments (4 replies)
[–] isVeryLoud@lemmy.ca 37 points 23 hours ago* (last edited 20 hours ago) (24 children)

Gist:

What’s new: The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” (full order below box). However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. The court also announced that it will have a trial on the pirated copies and any resulting damages, adding:

“That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”

load more comments (24 replies)
[–] vane@lemmy.world 21 points 23 hours ago* (last edited 23 hours ago) (11 children)

Ok so you can buy books scan them or ebooks and use for AI training but you can't just download priated books from internet to train AI. Did I understood that correctly ?

load more comments (11 replies)
load more comments
view more: next ›