this post was submitted on 08 Sep 2025
120 points (96.9% liked)
Selfhosted
60366 readers
472 users here now
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil.
-
No spam.
-
Posts are to be related to self-hosting.
-
Don't duplicate the full text of your blog or readme if you're providing a link.
-
Submission headline should match the article title.
-
No trolling.
-
Promotion posts require active participation, with an account that is at least 30 days old. F/LOSS without a paywall has exceptions, with requirements. See the rules link for details.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
In what way? Anything on the public internet is likely being used for AI training. I guess by using free GitHub you can't object to training.
Then again anywhere you host you sort of run into the same problem. You can use robots.txt, but things don't have to listen to it.
Self-hosting there are some ways to fight back, or depending on your opinions on Cloudflare it seems they’re fairly effective at blocking the AI crawlers.
Yep, on top of simply blocking, if you're self hosting or using cloudflare, you can enable AI tarpits.
How do I do this? I don't mind (and may prefer) to host not at home. My main concern with GH is that you become an AI snack whether you like it or not.
Which part? If you're wanting to use cloudflare pages, it's relatively straightforward. You can follow this and get up & running pretty quickly: https://www.hongkiat.com/blog/host-static-website-cloudflare-pages/
If you're asking about the tarpits, there's two ways (generally) to accomplish that. Even if you don't use cloudflare pages to host your site directly (if you use nginx on your server, for example), you can still enable AI tarpits for your entire domain, so long as you use cloudflare for your DNS provider. If you use pages, the setup is mostly the same: https://blog.cloudflare.com/ai-labyrinth/#how-to-use-ai-labyrinth-to-stop-ai-crawlers
If you want to do it all locally, you could instead setup iocaine or nepenthes which are both self hosted and can integrate with various webserver software. Obviously, cloudflare's tarpits are stupid simple to setup compared to these, but these give you greater control of exactly how you're poisoning the well and trapping crawlers.
Github, acquired by Microsoft, is now forcing AI on its user base.
That's one of my main drivers to stay away from GH