this post was submitted on 08 Aug 2025
419 points (99.5% liked)

Fediverse

21122 readers
29 users here now

A community dedicated to fediverse news and discussion.

Fediverse is a portmanteau of "federation" and "universe".

Getting started on Fediverse;

founded 5 years ago
MODERATORS
 

Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

(page 3) 50 comments
sorted by: hot top controversial new old
[–] heyWhatsay@slrpnk.net 9 points 2 days ago

Just make sure to add banana truck to the critical dialogue, and most importantly clown penis.

[–] socsa@piefed.social 10 points 2 days ago (9 children)

Definitely called this. Can we have private voting now? These people are scraping the fediverse and the current state of things is a privacy nightmare.

load more comments (9 replies)
[–] expatriado@lemmy.world 11 points 2 days ago (1 children)
load more comments (1 replies)
[–] Photuris@lemmy.ml 9 points 2 days ago

I hate the internet now

[–] flamingos@feddit.uk 9 points 2 days ago (2 children)

There's like half a dozen feddits and somehow feddit.uk is the only one to make it onto this?

Here's a list of instances in feddit.uk linked instances that appear in the list:

List of instance

beehaw.org
furry.engineer
ibe.social
fediworld.de
framatube.org
trailers.ddigest.com
nrw.social
lemmynsfw.com
video.hardlimit.com
digitalcourage.social
xn--baw-joa.social
tube.kockatoo.org
equestria.social
wisskomm.social
social.anoxinon.de
freiburg.social
toobnix.org
toot.bike
mstdn.lalafell.org
peertube.linuxrocks.online
social.rebellion.global
mastodon.cipherbliss.com
social.sdf.org
corteximplant.com
typo.social
www.404media.co
mastodon.ml
video.liberta.vip
tilvids.com
todon.eu
hessen.social
digipres.club
shigusegubu.club
mastodon.me.uk
zdf.social
mastodon.sdf.org
spore.social
kolektiva.media
gruene.social
share.tube
nso.group
mastouille.fr
masto.es
vivaldi.com
literatur.social
mstdn.mx
kirche.social
mastodon.hams.social
federation.network
lile.cl
todon.nl
betweenthelions.link
ipv6.social
linuxrocks.online
peertube.otakufarms.com
pawb.social
mastodon-belgium.be
jasette.facil.services
machteburch.social
mastodont.cat
mastodon.eus
eupolicy.social
social.bau-ha.us
toot.berlin
amicale.net
hexbear.net
mastodon.bida.im
reddthat.com
shelter.moe
mastodon.nl
dju.social
bonn.social
mstdn.chrisalemany.ca
social.sciences.re
tldr.nettime.org
lemy.lol
climatejustice.social
rollenspiel.social
mastodon.org.uk
social.kyiv.dcomm.net.ua
pouet.chapril.org
ecoevo.social
social.politicaconciencia.org
darmstadt.social
peertube.tv
lemmus.org
libretooth.gr
hackers.town
tooter.social
anarchism.space
diode.zone
video.infosec.exchange
mastodon.thirring.org
aussie.zone
social.bund.de
apobangpo.space
shitpost.cloud
berlin.social
toot.aquilenet.fr
social.beachcom.org
lemmygrad.ml
mastodon.radio
nerdculture.de
programming.dev
decayable.ink
kafeneio.social
functional.cafe
things.uk
fuzzies.wtf
diaspodon.fr
dalek.zone
sunbeam.city
tooting.ch
fediscience.org
mastodon.tetaneutral.net
social.librem.one
im-in.space
lemmy.sdf.org
legal.social
post.lurk.org
mastodon.uy
noc.social
tube.pol.social
lemmy.ml
don.linxx.net
infosec.pub
kolektiva.social
masto.bike
furries.club
zhub.link
lemmy.world
openbiblio.social
mastodon.zaclys.com
mamot.fr
clacks.link
discuss.tchncs.de
cyberplace.social
graz.social
pl.kitsunemimi.club
mastodonczech.cz
masto.nobigtech.es
hostux.social
pawb.fun
mastodon.trueten.de
norden.social
systemli.social
mander.xyz
ciberlandia.pt
woem.men
sopuli.xyz
lemmy.ca

[–] poVoq@slrpnk.net 7 points 2 days ago

Given that we used to see lots of Meta scraping a while back on our instance and had to implement Anubis as a result, it is interesting to see that slrpnk.net doesn't seem to be on this list (anymore).

load more comments (1 replies)
[–] scintilla@crust.piefed.social 8 points 2 days ago (5 children)

Can someone explain why they would need to scrape multiple instances? Are they intentionally going after the fediverse or is it just a byproduct of meta trying to get all of human communication?

[–] wuphysics87@lemmy.ml 7 points 2 days ago

The second one

[–] BlueEther@no.lastname.nz 4 points 2 days ago

probably the latter

[–] frongt@lemmy.zip 3 points 2 days ago

It's a lot easier for them to use the same scraper they use on other sites than to build something custom.

load more comments (2 replies)
[–] codexarcanum@lemmy.dbzer0.com 8 points 2 days ago (6 children)

Hmmm... I don't see dbzer0 in the list, I wonder how we escaped? I think we're like the 3rd or 4th biggest instance, and positive leaning on AI. Maybe @db0@lemmy.dbzer0.com just has amazing sys admin skills?

[–] poVoq@slrpnk.net 8 points 2 days ago

Maybe they don't want to ingest AI generated content to prevent model decay and thus remove sites that promote AI use?

load more comments (5 replies)
[–] nickwitha_k@lemmy.sdf.org 8 points 2 days ago

This explains our instance having perf issues.

[–] PhilipTheBucket@quokk.au 7 points 2 days ago

This isn't really a Lemmy badge of approval or anything, although it is a little interesting. They suck up literally every single thing they can get their grubby little mitts on.

[–] HubertManne@piefed.social 7 points 2 days ago

Thanks but im sure its average at best.

We’re on the list? Lol.

[–] Aeri@lemmy.world 3 points 2 days ago

Ew gross can't wait to have to answer captchas

[–] avidamoeba@lemmy.ca 4 points 2 days ago

We made it!

load more comments
view more: ‹ prev next ›