this post was submitted on 06 Feb 2025
396 points (100.0% liked)

Technology

61758 readers
4081 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] njordomir@lemmy.world 7 points 1 hour ago

If someone was to acquire a few hundred gigs of books and feed them to something like paperless-ngx, would it work as a sort of google of books? Are there any software projects better suited for doing thisand understand synonyms and perhaps some context? I guess AI search but guided for the intermediate user.

Google is so bad lately. Basically every result is official sponsored corporate biased BS. It would be nice to be able to instantly query a bunch of ebooks.

[–] SpikesOtherDog@ani.social 24 points 4 hours ago

https://phys.org/news/2010-11-million-dollar-verdict-music-piracy-case.html

In all fairness, meta should be assessed a fee of 250k per EACH pirated work.

This would amount to forfeiting all assets to doge.

[–] Grimy@lemmy.world 24 points 7 hours ago (4 children)

Meta has open sourced every single one of their llms. They essentially gave birth to the whole open llm scene.

If they start losing all these lawsuits, the whole scene dies and all those nifty models and their fine-tunes get removed from huggingface, to be repackaged and sold to us with a subscription fee. All the other domestic open source players will close down.

The copyright crew aren't the good guys here, even if it's spearheaded by Sarah Silverman and Meta has traditionally played the part of the villain.

[–] LodeMike@lemmy.today 1 points 3 minutes ago

Where is the source content then

[–] antonim@lemmy.dbzer0.com 4 points 1 hour ago

If the existence of open source LLMs hinges on the benevolence of one of the few most cancerous tech companies in the world, maybe they're not really worth it?

This isn't about "heroes" and "villains". Facebook has been and has stayed the "villain", they've done something colossally illegal that any mere mortal would be sued to death for (by an another "villainous" instance, the media system that has made piracy a necessity in the first place), and they're hoping to get away with it simply on technicalities and by having more money for better lawyers. Rules are rules, if you don't like them maybe Facebook should try to change them (and not just for themselves, but for the rest of us too)?

[–] Telodzrum@lemmy.world 4 points 3 hours ago

Nope. Get fucked

[–] misk@sopuli.xyz 23 points 7 hours ago (3 children)

Meta stole from everyone, including those that struggle to make ends meet, so it doesn’t matter that they gave you back some of it. Any moral qualms should evaporate when you consider that they did it to create shareholder value and the rest is philanthropy (aka pretend tax). As a socialist I believe that man is owed for his work and you can’t take from him even though technology makes it so easy.

[–] General_Effort@lemmy.world 11 points 5 hours ago

Calling property labor, doesn't make you a socialist.

[–] LainTrain@lemmy.dbzer0.com 26 points 6 hours ago* (last edited 6 hours ago) (1 children)

As a socialist I believe intellectual property is a falsehood and technological advancement should be for the public good. Open source LLMs are for the public good.

Given the options between having open source LLMs and the US Govt banning non-corpo non-proprietary LLMs and giving a free pass to people like Musk and Altman and Zucc to monopolize, I happily pick the former.

You're delusional if you think they will pay anyone, the only way zucc will pay is with a guillotine.

Corpos will make inter-platform deals that'll simply make all online data licensable for the right price and enrich each other so you can't avoid it while still actually being a career creative, but price out academic researchers and the public sector so that all fruits of it stay behind closed R&D doors and be free of ethics etc.

Continuing in your role as a useful idiot, you'll also most likely also foot the bill for it via subsidies from your taxes to "develop the AI sector" in some anti-China dick measuring contest by the US.

[–] foenkyfjutschah@programming.dev -1 points 6 hours ago

Lieber Genosse, der Hype um Affirming Incompetence (AI) ist der dieser Zeit die höchste Ausdruck der Entfremdung der Menschen von sich selbst, Zeugnis des Begehrens nach und Voraussetzung also der weitergehenden Fetischisierung seines Zugriffs auf Welt. Wie jedoch Bernard Stiegler so schön bemerkte: Kein Savoir-vivre ohne Savoir-faire! Dies seien die unabdingbaren Bedingungen für die Befreiung der Menschheit aus den sich selbst angelegten Ketten zur Errichtung einer geschwisterlichen Ordnung!

(now have fun w/ an LLM's attempt of "advancement"!)

load more comments (1 replies)
[–] LEVI@feddit.org 64 points 8 hours ago (1 children)

Anna's Archive: Mirror our database, help us preserve Humanity's knowledge

Facebook: I'll just torrent what I need, see yaa

These big tech monopolies are a curse to humanity..

[–] mox@lemmy.sdf.org 8 points 4 hours ago

Facebook: I’ll just ~~torrent what I need~~ burden your underfunded project and volunteers with over 81 TB of bandwidth costs without contributing anything in return, see yaa

FTFY

[–] jaybone@lemmy.world 17 points 7 hours ago (1 children)
[–] misk@sopuli.xyz 27 points 6 hours ago (1 children)

It’s a popular search engine that works with shadow libraries like Sci-Hub or Library Genesis. Shadow libraries are hosts to copies of works of literature and science. Their legal status is murky at best but it’s incredibly impractical to persecute those accessing them.

[–] jaybone@lemmy.world 5 points 6 hours ago (3 children)

So it’s like thepiratebay or 1337x.to but for books?

Also I think you mean prosecuting, not persecuting.

[–] PM_Your_Nudes_Please@lemmy.world 8 points 4 hours ago* (last edited 4 hours ago)

TPB and 1337x are torrents, whereas Anna’s Archive typically uses direct downloads. So it’s more akin to the old CoolROMs back before the massive takedown purges.

Anna’s Archive does offer torrents, but it’s not for individual files. Their torrents are more like database backups, with thousands of books each. In fact, people will download and seed them to help increase AA’s resilience. Since they aren’t super useful for individual files, very few people use them as such. But clearly, Meta just used them to feed into an LLM, because they didn’t care about the content of the files as long as they were properly written. It was less “looking for your favorite fantasy book” and more “looking to grab every fantasy book ever written.”

[–] SharkAttak@kbin.melroy.org 7 points 5 hours ago

Also I think you mean prosecuting, not persecuting.

Nowadays, I'm not so sure anymore.

[–] Corkyskog@sh.itjust.works 7 points 6 hours ago (1 children)

Those are torrents, Annas Archive is typically used for direct downloads.

[–] jaybone@lemmy.world 2 points 6 hours ago (1 children)

Thanks. It’s confusing because everyone is talking about torrents. It’s in the title even, but I didn’t read the article.

[–] Corkyskog@sh.itjust.works 5 points 6 hours ago

Well i think you can also torrent off of there too. There are massive backup files on their home page that they are basically begging people to download and seed... So maybe it's that?

[–] shittydwarf@lemmy.dbzer0.com 104 points 9 hours ago
[–] Telorand@reddthat.com 187 points 11 hours ago (3 children)

Do it, Judge. Protect the wealthy and say it's not piracy. Do it.

[–] Lexam@lemmy.world 87 points 10 hours ago (1 children)

It's not piracy. For corporations. For you and me believe it or not, straight to jail!

[–] curbstickle@lemmy.dbzer0.com 13 points 9 hours ago (1 children)

Just make an llc, now its legal again.

[–] grue@lemmy.world 6 points 8 hours ago* (last edited 8 hours ago) (1 children)

I'd almost like to think an LLC would be enough, but I suspect that only works if you also have a billion in VC funding and political connections.

[–] curbstickle@lemmy.dbzer0.com 6 points 7 hours ago

Oh for sure, since the law is basically toilet paper for billionaires at this point.

[–] Damage@slrpnk.net 6 points 6 hours ago (1 children)
[–] Telorand@reddthat.com 5 points 6 hours ago

And they'll ham up how punished and sorry they are, and how thankful they are for the judge handing down "fair and impartial" justice.

[–] abobla@lemm.ee 33 points 10 hours ago

Please! Think of the shareholders, we must protect them!

[–] akilou@sh.itjust.works 104 points 10 hours ago (3 children)

But did they keep a good ratio though?

[–] empireOfLove2@lemmy.dbzer0.com 88 points 10 hours ago (1 children)

1000% guarantee those mf's had their upload choked to 20kbps

[–] guaraguaito@lemmy.blahaj.zone 36 points 10 hours ago* (last edited 10 hours ago) (1 children)

Nah they used a leeching client. No upload at all.

[–] empireOfLove2@lemmy.dbzer0.com 10 points 10 hours ago (1 children)

Gotta have some upload just for the protocol traffic tho.

[–] bamboo@lemmy.blahaj.zone 20 points 10 hours ago

I would assume that the requests sent from the torrent client to download data are not factored into the Upload amount for the torrent. When they mean no upload, it would be that none of the data in the files they downloaded were shared with anyone else, making them a piece of shit leecher.

[–] rottingleaf@lemmy.world 8 points 8 hours ago (1 children)

In copyright protection terms the ratio shouldn't matter. They should pay for all the lost profits from pirating everything they've downloaded. Every time someone pirated it should be counted. And every time someone uses the AI trained on the data.

They can become the corporate Jesus of the interwebs, having paid for our sins.

[–] grue@lemmy.world 6 points 7 hours ago (1 children)

Technically, copyright infringement is committed by the entity making and sending the copy, not the entity receiving it. Leeching could indeed remove liability.

I'm not sure if the courts have cared about that nuance when persecuting the 'small fish,' but I bet they would in this 'big fish' case.

[–] MangoCats@feddit.it 4 points 7 hours ago (1 children)

If the receiving entity then ingests all that copyrighted material into its AI, and the AI sends it piece at a time to other receiving entities, that should be the AI infringing on everything it is copying to make its answers.

[–] grue@lemmy.world 4 points 7 hours ago (1 children)

Yes, yes it should. But that's a different act than the one being discussed here.

[–] MangoCats@feddit.it 0 points 5 hours ago

I agree. Still doesn't hurt to bring it up on appropriate tangents.

[–] SnotFlickerman@lemmy.blahaj.zone 16 points 10 hours ago

Asking the real questions.

[–] SnotFlickerman@lemmy.blahaj.zone 91 points 10 hours ago (2 children)

“Meta downloaded millions of pirated books from LibGen through the bit torrent protocol using a platform called LibTorrent. Internally, Meta acknowledged that using this protocol was legally problematic,” the third amended complaint noted.

Just want to make clear that Libtorrent is just the torrent application they were using, while the Libgen torrents are easily accessible on the libgen site, not through a separate "platform" called Libtorrent.

I wish people like us could help with these complaints, because then they might actually get the details more accurate to reality.

https://libgen.is/repository_torrent/

https://www.libtorrent.org/

The amended complaint makes it sound like Libtorrent is a private tracker website when its just the application they were using on the publicly available torrents.

[–] corsicanguppy@lemmy.ca 11 points 8 hours ago (1 children)

People are putting an S on the end of words like 'traffic' and 'email'. They will never understand the semantics of that correction.

[–] paraphrand@lemmy.world 1 points 7 hours ago

Meta Horizons

[–] db2@lemmy.world 9 points 10 hours ago

Totes yeet, yo.

[–] daggermoon@lemmy.world 18 points 8 hours ago

Damn leeches

[–] SinningStromgald@lemmy.world 29 points 10 hours ago (2 children)

Given the extent it should be considered criminal so $250k per offense and the higher ups who authorized the torrenting should get conspiracy charges at a minimum.

But this is America so they'll probably pay a small amount, for Meta, and a light slap on the wrist with a finger wagging.

[–] Pika@sh.itjust.works 14 points 9 hours ago

you are being optimistic, it's likely going to be considered "fair use" and then be business as usual. Meta themselves have claimed that they aren't filing to dismiss because they believe they are on the legal side, due to the fact they aren't distributing the pirated content, only using it for training which is currently a massive grey area that hasen't been ruled as non-fair use

[–] artificialfish@programming.dev 4 points 9 hours ago (2 children)

$250k per offence is literally nothing to meta.

[–] grue@lemmy.world 7 points 7 hours ago

$250k * [every book in existence] is literally nothing?

Remember, "offense" doesn't mean "per torrent," it means "per copyrighted work infringed."

[–] SnotFlickerman@lemmy.blahaj.zone 8 points 8 hours ago* (last edited 8 hours ago)

Each time someone uses their LLM it should be considered a violation.

People are using these things millions of times a day in aggregate. That adds up fast. $250k multiplied by millions suddenly isn't so cheap.

load more comments
view more: next ›