this post was submitted on 08 Aug 2025
431 points (99.5% liked)

Fediverse

21348 readers
7 users here now

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of "federation" and "universe".

Getting started on Fediverse;

founded 5 years ago
MODERATORS
 

Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

you are viewing a single comment's thread
view the rest of the comments
[–] Sal@mander.xyz 7 points 2 weeks ago (2 children)

Ahh, really?! Thanks for letting me know. I will see if there is something I can do to throttle that after holidays. Curious to see what solutions others come up with

[–] fossilesque@mander.xyz 7 points 2 weeks ago (1 children)
[–] Sal@mander.xyz 2 points 1 day ago (2 children)

That's interesting. I still don't fully understand the implications from a user-experience perspective. It looks as if the proof-of-work would go unnoticed when using a user client but presents a more significant challenged for an automated scraping bot. So, it does look promising. I still don't understand what it would do to a bot such as a 'PlantID bot' and other good bots. Do they have a heavy soul? I'll look into it.

For now, I have modified https://mander.xyz/robots.txt, copying the file that Dave from lemmy.nz found to work to prevent at least some scraping and bot load.

[–] fossilesque@mander.xyz 1 points 1 day ago* (last edited 1 day ago)
[–] Sal@mander.xyz 1 points 1 day ago

I also don't know what it would do to HTTP requests from federated instances

[–] fossilesque@mander.xyz 5 points 2 weeks ago

I think Science Memes may make it halucinate more, tbf.