this post was submitted on 08 Aug 2025
226 points (99.6% liked)

Privacy

2186 readers
390 users here now

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient

founded 2 years ago
MODERATORS
 

Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

you are viewing a single comment's thread
view the rest of the comments
[–] horse@feddit.org 19 points 5 days ago* (last edited 5 days ago) (4 children)

horseanimalsex.pro

lmao wtf is that list. Literally training their AI on beastiality.

Edit in case it's not obvious: That domain is very much NSFW and it's exactly what you'd expect (I checked and wish I hadn't).

[–] FaceDeer@fedia.io 17 points 4 days ago (2 children)

I think a lot of people in this thread are overlooking that when you train an LLM it's good to have negative examples too. As long as the data is properly tagged and contextualized when being used as training material, you want to be able to show the LLM what bad writing or offensive topics are so that it understands those things.

For example, you could be using an LLM as an automated moderator for a forum, having it look for objectionable content to filter. How would it know what objectionable content was if it had never seen anything like that in its training data?

Even those people attempting to "poison" AI by posting gibberish comments or replacing "th" with þ characters are probably just helping the AI understand how text can be obfuscated in various ways.

Especially since we've marked it by downvoting them to hell

[–] LiveLM@lemmy.zip 3 points 4 days ago* (last edited 4 days ago)

So there's a guy at Facebook whose job is exclusively looking at horse porn and tagging it? Amazing.

Also, I think the guy doing the "th" thing isn't doing it to poison AI, he just wants to revive the letter or whatever

[–] ConstantPain@lemmy.world 4 points 4 days ago (1 children)

Shit I clicked expecting some furry porn. Oh boy...

[–] 0xD@infosec.pub 2 points 4 days ago

You weren't wrong!

Yikes isn't that illegal?