this post was submitted on 08 Aug 2025
229 points (99.6% liked)

Privacy

2350 readers
119 users here now

Icon base by Lorc under CC BY 3.0 with modifications to add a gradient

founded 2 years ago
MODERATORS
 

Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

you are viewing a single comment's thread
view the rest of the comments
[–] TachyonTele@piefed.social 8 points 3 weeks ago (6 children)

What are ways to stop them?

[–] Carnelian@lemmy.world 15 points 3 weeks ago

I mean, everything we do on here is totally public, so, I would guess there is nothing to be done?

[–] usernameusername@sh.itjust.works 9 points 3 weeks ago (1 children)

Maybe Anubis, although idk if it works for Lemmy instances

[–] LodeMike@lemmy.today 6 points 3 weeks ago

This is simply a reverse proxy so it should work with pretty much anything.

[–] FaceDeer@fedia.io 7 points 3 weeks ago (2 children)

Switch to a non-open protocol or walled garden, preferably controlled by a large and litigious organization that guards its content jealously. They'll probably still sell access to their data to LLM trainers but not necessarily Facebook.

Reddit, for example, may fit the bill. IIRC they sell their data to OpenAI for training, so there might be exclusivity deals intended to keep Facebook out.

[–] TachyonTele@piefed.social 1 points 3 weeks ago (1 children)

I was thinking more what could instinces themselves do. Is it something that can be mitigated, like with bot accounts.

[–] FaceDeer@fedia.io 1 points 3 weeks ago

I don't see any way to "mitigate" this while still using the ActivityPub protocol. This isn't about a bot posting on the Fediverse, it's about reading the Fediverse. If you want to prevent that then you're probably talking about some form of DRM or a walled garden.

[–] Gobbel2000@programming.dev 1 points 2 weeks ago

reddit.com is in fact not on the list.

[–] s@piefed.world 6 points 3 weeks ago

Post and repeatedly endorse generally inoffensive content that for some reason violates Facebook’s ToS, such as the comic book cover of Captain America punching Hitler or the Led Zeppelin album “Houses of the Holy”

[–] Kissaki@programming.dev 2 points 3 weeks ago

GDPR complaints to data protection offices may lead to significant fines?