this post was submitted on 08 Aug 2025
431 points (99.5% liked)

Fediverse

21403 readers
38 users here now

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of "federation" and "universe".

Getting started on Fediverse;

founded 5 years ago
MODERATORS
 

Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

you are viewing a single comment's thread
view the rest of the comments
[–] Sal@mander.xyz 2 points 4 days ago (2 children)

That's interesting. I still don't fully understand the implications from a user-experience perspective. It looks as if the proof-of-work would go unnoticed when using a user client but presents a more significant challenged for an automated scraping bot. So, it does look promising. I still don't understand what it would do to a bot such as a 'PlantID bot' and other good bots. Do they have a heavy soul? I'll look into it.

For now, I have modified https://mander.xyz/robots.txt, copying the file that Dave from lemmy.nz found to work to prevent at least some scraping and bot load.

[–] fossilesque@mander.xyz 1 points 4 days ago* (last edited 4 days ago)
[–] Sal@mander.xyz 1 points 4 days ago

I also don't know what it would do to HTTP requests from federated instances