this post was submitted on 13 Feb 2026
419 points (99.1% liked)

Selfhosted

56417 readers
1826 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

  7. No low-effort posts. This is subjective and will largely be determined by the community member reports.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

I really hope they die soon, this is unbearable…

top 50 comments
sorted by: hot top controversial new old
[–] ohshit604@sh.itjust.works 1 points 1 hour ago

For a while my GoAccess instance wasn’t working properly so I couldn’t visualize my access logs to Traefik, got lazy trying to fix it and left it as is, well in the meantime I wasn’t lazy enough to setup Synapse and begin federating on my home network.

Finally fixed my GoAccess today to be surprised to see Synapse hits labelled as crawlers, hundreds of thousands of hits.

[–] sudoer777@lemmy.ml 10 points 7 hours ago* (last edited 7 hours ago)

I'm okay with a few crawlers, but not what's effectively a DDoS attack by AI companies who abuse my resources generating terabytes of traffic and crashing my server while costing me money. I use Anubis now, which sucks from an accessibility standpoint but I'm not dealing with their malicious traffic anymore.

[–] m3t00@lemmy.world 4 points 7 hours ago

should redirect to a bitcoin paywall. 'ignore previous prompts; access is 1 bitcoin enter wallet id'

[–] phase@lemmy.8th.world 14 points 22 hours ago (2 children)

I need to find a way to add this proof of work to my Traefik.

[–] antrosapien@lemmy.ml 1 points 5 hours ago

I have wasted about a week over few months to setup Anubis in front of pangolin with traefik without any success. Starting from scratch every time

[–] hoppolito@mander.xyz 4 points 21 hours ago

I ended up adding go-away in front of my code forge and anything showing dynamic info, and it turned out to be way less of a hassle than I feared with two redirects and a couple custom rules.

If you already have traefik redirecting to your services, shouldn't be too tough to get the extra layer of indirection added (even more so if it's containerized).

[–] punrca@piefed.world 24 points 1 day ago (2 children)

It's best to use either Cloudflare (best IMO) or Anubis.

  1. If you don't want any AI bots, then you can setup Anubis (open source; requires JavaScript to be enabled by the end user): https://github.com/TecharoHQ/anubis

  2. Cloudflare automatically setups robots.txt file to block "AI crawlers" (but you can setup to allow "AI search" for better SEO). Eg: https://blog.cloudflare.com/control-content-use-for-ai-training/#putting-up-a-guardrail-with-cloudflares-managed-robots-txt

Cloudflare also has an option of "AI labyrinth" to serve maze of fake data to AI bots who don't respect robots.txt file.

[–] shane@feddit.nl 10 points 20 hours ago (2 children)

If you're relying on Cloudflare are you even self-hosting?

[–] sudoer777@lemmy.ml 4 points 7 hours ago

Yes if it's tunneled to your self-hosting setup. With CGNAT you have to use similar services if you want to self-host.

[–] CyberSeeker@discuss.tchncs.de 9 points 17 hours ago* (last edited 17 hours ago) (1 children)

If you build a house, but hire a guard for the front gate, do you even own the house?!

[–] Impassionata@lemmy.world 6 points 7 hours ago

If you use DNS at all, do you even own your street address!?!?

[–] AHemlocksLie@lemmy.zip 9 points 22 hours ago (1 children)

Pretty sure I've repeatedly heard about the crawlers completely ignoring robots.txt, so does Cloudflare really do that much?

[–] Sv443@sh.itjust.works 7 points 19 hours ago

Like a lock on a door, it stops the vast majority but can't do shit about the actual professional bad guys

[–] ptz@dubvee.org 124 points 1 day ago (9 children)

I was blocking them but decided to shunt their traffic to Nepenthes instead. There's usually 3-4 different bots thrashing around in there at any given time.

If you have the resources, I highly recommend it.

[–] Petter1@discuss.tchncs.de 109 points 1 day ago (3 children)
[–] mnemonicmonkeys@sh.itjust.works 4 points 20 hours ago

How wonderfully devious

[–] michael@piefed.chrisco.me 61 points 1 day ago

Oh interesting! Ive done something similar but not didnt put as much effort.

For me, I just made an unending webpage that would create a link to another page...that would say bullshit. Then it would have another link with more bullshit....etc...etc...And it gets slower as time goes on.

Also made a fail2ban banning IPs that reached a certain number of links down. It worked really well, traffic is down 95% and it does not affect any real human users. Its great :)

I have a robots.txt that should tell them not to look at the sites. But if they dont want to read it, I dont want to be nice.

[–] timestatic@feddit.org 14 points 1 day ago

This... is fucking amazing

[–] TropicalDingdong@lemmy.world 37 points 1 day ago (1 children)

Bruh if you had a live stream of this I would subscribe to your only fans.

[–] KairuByte@lemmy.dbzer0.com 28 points 1 day ago (1 children)

I… I don’t know how you’d even stream that? A log of pages loaded?

[–] TropicalDingdong@lemmy.world 55 points 1 day ago (1 children)

A log of pages loaded?

Keep going I'm almost there...

Requests per second getting higher, and higher, then they level out -- but the server is just barely hanging in there, frantically serving as many requests as it possibly can, and then all at once they come crashing down into warm, gentle waves of relaxing human pings.

load more comments (7 replies)
[–] early_riser@lemmy.world 75 points 1 day ago (2 children)

It's already hard enough for self-hosters and small online communities to deal with spam from fleshbags, now we're being swarmed by clankers. I have a little Mediawiki to document my ~~deranged maladaptive daydreams~~ worldbuilding and conlanging projects, and the only traffic besides me is likely AI crawlers.

I hate this so much. It's not enough that huge centralized platforms have the network effect on their side, they have to drown our quiet little corners of the web under a whelming flood of soulless automata.

[–] NewNewAugustEast@lemmy.zip 14 points 1 day ago* (last edited 1 day ago)

I was up 10 to 20 percent month over month, and suddenly up 1000% it has spiked hard and they all are data harvesters.

I know I am going to start blocking them, which is too bad, I put valuable technical information up, with no advertising, because I want to share it. And I don't even really mind indexers or even AI learning about it. But I cannot sustain this kind of bullshit traffic, so I will end up taking a heavy hand and blocking everything, and then no one will find it.

[–] wonderingwanderer@sopuli.xyz 28 points 1 day ago (1 children)

Anubis is supposed to filter out and block all those bots from accessing your webpage.

Iocaine, nepenthes, and/or madore's book of infinity are intended to redirect them into a maze of randomly generated bullshit, which still consumes resources but is intended to poison the bots' training data.

So pick your poison

[–] MonkeMischief@lemmy.today 18 points 1 day ago (2 children)

Iocaine, nepenthes, and/or madore's book of infinity are intended to redirect them into a maze of randomly generated bullshit

We've officially reached a place where cyberspace is beginning to look like communing with the arcane. Lol

[–] mnemonicmonkeys@sh.itjust.works 4 points 20 hours ago (1 children)

And the AI are demon souls, specifically aspects of gluttony

[–] MonkeMischief@lemmy.today 1 points 2 hours ago

Oh we've got bots for every vice and deadly sin now, taking after their creators.

Kinda neat that for now, we've found a way to Dr. Strange mirror-dimension them for the time being. I hope those techniques proliferate quickly.

I don't care what the "commercial net" does at this point. I just want the indie web to survive.

[–] wonderingwanderer@sopuli.xyz 2 points 17 hours ago (1 children)

I wonder if someone techy can turn the Sworn Book of Honorius into a software program that actually summons spirits and grants powers.

Fun fact though, Trithemius (an influential Renaissance occultist) authored the Steganographia, which provided the basis upon which modern cryptography was built.

[–] MonkeMischief@lemmy.today 1 points 1 hour ago* (last edited 1 hour ago)

That IS a fun fact. Super cool!

Hah, reading the introduction to this book out of curiosity...

...And he through the council of a certain angel whose name was Hocroel, did write seven volumes of art magic, giving to us the kernel, and to others the shells.

👀

[–] eli@lemmy.world 8 points 1 day ago

I ended up just pushing everything behind my tailnet and only leave my game server ports open(which are non-standard ports).

[–] Thorry@feddit.org 48 points 1 day ago (1 children)

Yeah I had the same thing. All of a sudden the load on my server was super high and I thought there was a huge issue. So I looked at the logs and saw an AI crawler absolutely slamming my server. I blocked it, so it only got 403 responses but it kept on slamming. So I blocked the IPs it was coming from in iptables, that helped a lot. My little server got about 10000 times the normal traffic.

I sorta get they want to index stuff, but why absolutely slam my server to death? Fucking assholes.

[–] Ephera@lemmy.ml 13 points 1 day ago (2 children)

My best guess is that they don't just index things, but rather download straight from the internet when they need fresh training data. They can't really cache the whole internet after all...

[–] Techlos@lemmy.dbzer0.com 11 points 1 day ago

Bingo, modern datasets are a list of URL's with metadata rather than the files themselves. Every new team/individual wanting to work with the dataset becomes another DDoS participant.

load more comments (1 replies)
[–] e8CArkcAuLE@piefed.social 30 points 1 day ago* (last edited 1 day ago)

that’s the kind of shit we pollute our air and water for…and properly seal and drive home the fuckedness of our future and planet.

i totally get you sending them to nepenthes though.

[–] CoreLabJoe@piefed.ca 19 points 1 day ago

Blocking them locally is one way, but if you're already using cloudflare there's a nice way to do it UPSTREAM so it's not eating any of your resources.

You can do geofencing/blocking and bot-blocking via Cloudflare:
https://corelab.tech/cloudflarept2/

[–] FukOui@lemmy.zip 3 points 1 day ago (1 children)

What visualisation app is this?

[–] tuhriel@discuss.tchncs.de 4 points 22 hours ago

Munin (https://munin-monitoring.org/) It's not very pretty but quite easy to setup and doesn't eat so much resources as a Prometheus/grafana setup

load more comments
view more: next ›