this post was submitted on 07 Jul 2025
600 points (98.4% liked)

Open Source

38918 readers
393 users here now

All about open source! Feel free to ask questions, and share news, and interesting stuff!

Useful Links

Rules

Related Communities

Community icon from opensource.org, but we are not affiliated with them.

founded 5 years ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] unexposedhazard@discuss.tchncs.de 126 points 1 week ago* (last edited 1 week ago) (3 children)

Non paywalled link https://archive.is/VcoE1

It basically boils down to making the browser do some cpu heavy calculations before allowing access. This is no problem for a single user, but for a bot farm this would increase the amount of compute power they need 100x or more.

[–] Mubelotix@jlai.lu 77 points 1 week ago (7 children)

Exactly. It's called proof-of-work and was originally invented to reduce spam emails but was later used by Bitcoin to control its growth speed

load more comments (7 replies)
[–] exu@feditown.com 12 points 1 week ago

It inherently blocks a lot of the simpler bots by requiring JavaScript as well.

load more comments (1 replies)
[–] fuzzy_tinker@lemmy.world 91 points 1 week ago (7 children)

This is fantastic and I appreciate that it scales well on the server side.

Ai scraping is a scourge and I would love to know the collective amount of power wasted due to the necessity of countermeasures like this and add this to the total wasted by ai.

load more comments (7 replies)
[–] grysbok@lemmy.sdf.org 52 points 1 week ago

My archive's server uses Anubis and after initial configuration it's been pain-free. Also, I'm no longer getting multiple automated emails a day about how the server's timing out. It's great.

We went from about 3000 unique "pinky swear I'm not a bot" visitors per (iirc) half a day to 20 such visitors. Twenty is much more in-line with expectations.

[–] Jankatarch@lemmy.world 43 points 1 week ago

Everytime I see anubis I get happy because I know the website has some quality information.

[–] Panda@lemmy.today 32 points 1 week ago (2 children)

I've seen this pop up on websites a lot lately. Usually it takes a few seconds to load the website but there have been occasions where it seemed to hang as it was stuck on that screen for minutes and I ended up closing my browser tab because the website just wouldn't load.

Is this a (known) issue or is it intended to be like this?

[–] lime@feddit.nu 27 points 1 week ago (1 children)

anubis is basically a bitcoin miner, with the difficulty turned way down (and obviously not resulting in any coins), so it's inherently random. if it takes minutes it does seem like something is wrong though. maybe a network error?

[–] isolatedscotch@discuss.tchncs.de 19 points 1 week ago* (last edited 1 week ago) (5 children)

adding to this, some sites set the difficulty way higher then others, nerdvpn's invidious and redlib instances take about 5 seconds and some ~20k hashes, while privacyredirect's inatances are almost instant with less then 50 hashes each time

load more comments (5 replies)
load more comments (1 replies)
[–] bdonvr@thelemmy.club 30 points 1 week ago (9 children)

Ooh can this work with Lemmy without affecting federation?

[–] beyond@linkage.ds8.zone 32 points 1 week ago (1 children)

Yes.

Source: I use it on my instance and federation works fine

[–] bdonvr@thelemmy.club 16 points 1 week ago (1 children)

Thanks. Anything special configuring it?

[–] beyond@linkage.ds8.zone 20 points 1 week ago* (last edited 1 week ago)

I keep my server config in a public git repo, but I don't think you have to do anything really special to make it work with lemmy. Since I use Traefik I followed the guide for setting up Anubis with Traefik.

I don't expect to run into issues as Anubis specifically looks for user-agent strings that appear like human users (i.e. they contain the word "Mozilla" as most graphical web browsers do) any request clearly coming from a bot that identifies itself is left alone, and lemmy identifies itself as "Lemmy/{version} +{hostname}" in requests.

[–] deadcade@lemmy.deadca.de 11 points 1 week ago (1 children)

"Yes", for any bits the user sees. The frontend UI can be behind Anubis without issues. The API, including both user and federation, cannot. We expect "bots" to use an API, so you can't put human verification in front of it. These "bots* also include applications that aren't aware of Anubis, or unable to pass it, like all third party Lemmy apps.

That does stop almost all generic AI scraping, though it does not prevent targeted abuse.

load more comments (1 replies)
load more comments (7 replies)
[–] medem@lemmy.wtf 24 points 1 week ago (7 children)

What advantage does this software provide over simply banning bots via robots.txt?

[–] kcweller@feddit.nl 88 points 1 week ago

Robots.txt expects that the client is respecting the rules, for instance, marking that they are a scraper.

AI scrapers don't respect this trust, and thus robots.txt is meaningless.

[–] medem@lemmy.wtf 47 points 1 week ago

Well, now that y'all put it that way, I think it was pretty naive from me to think that these companies, whose business model is basically theft, would honour a lousy robots.txt file...

the scrapers ignore robots.txt. It doesn't really ban them - it just asks them not to access things, but they are programmed by assholes.

[–] irotsoma@lemmy.blahaj.zone 29 points 1 week ago

TL;DR: You should have both due to the explicit breaking of the robots.txt contract by AI companies.

AI generally doesn't obey robots.txt. That file is just notifying scrapers what they shouldn't scrape, but relies on good faith of the scrapers. Many AI companies have explicitly chosen not no to comply with robots.txt, thus breaking the contract, so this is a system that causes those scrapers that are not willing to comply to get stuck in a black hole of junk and waste their time. This is a countermeasure, but not a solution. It's just way less complex than other options that just block these connections, but then make you get pounded with retries. This way the scraper bot gets stuck for a while and doesn't waste as many of your resources blocking them over and over again.

[–] thingsiplay@beehaw.org 14 points 1 week ago

The difference is:

  • robots.txt is a promise without a door
  • Anubis is a physical closed door, that opens up after some time
load more comments (2 replies)
[–] refalo@programming.dev 21 points 1 week ago* (last edited 1 week ago) (2 children)

I don't understand how/why this got so popular out of nowhere... the same solution has already existed for years in the form of haproxy-protection and a couple others... but nobody seems to care about those.

[–] Flipper@feddit.org 47 points 1 week ago (1 children)

Probably because the creator had a blog post that got shared around at a point in time where this exact problem was resonating with users.

It's not always about being first but about marketing.

[–] JohnEdwa@sopuli.xyz 27 points 1 week ago* (last edited 1 week ago) (1 children)

It’s not always about being first but about marketing.

And one has a cute catgirl mascot, the other a website that looks like a blockchain techbro startup.
I'm even willing to bet the amount of people that set up Anubis just to get the cute splash screen isn't insignificant.

[–] JackbyDev@programming.dev 19 points 1 week ago

Compare and contrast.

High-performance traffic management and next-gen security with multi-cloud management and observability. Built for the enterprise — open source at heart.

Sounds like some over priced, vacuous, do-everything solution. Looks and sounds like every other tech website. Looks like it is meant to appeal to the people who still say "cyber". Looks and sounds like fauxpen source.

Weigh the soul of incoming HTTP requests to protect your website!

Cute. Adorable. Baby girl. Protect my website. Looks fun. Has one clear goal.

load more comments (1 replies)
[–] Kazumara@discuss.tchncs.de 13 points 1 week ago (1 children)

Just recently there was a guy on the NANOG List ranting about Anubis being the wrong approach and people should just cache properly then their servers would handle thousands of users and the bots wouldn't matter. Anyone who puts git online has no-one to blame but themselves, e-commerce should just be made cacheable etc. Seemed a bit idealistic, a bit detached from the current reality.

Ah found it, here

[–] deadcade@lemmy.deadca.de 14 points 1 week ago (3 children)

Someone making an argument like that clearly does not understand the situation. Just 4 years ago, a robots.txt was enough to keep most bots away, and hosting personal git on the web required very little resources. With AI companies actively profiting off stealing everything, a robots.txt doesn't mean anything. Now, even a relatively small git web host takes an insane amount of resources. I'd know - I host a Forgejo instance. Caching doesn't matter, because diffs berween two random commits are likely unique. Ratelimiting doesn't matter, they will use different IP (ranges) and user agents. It would also heavily impact actual users "because the site is busy".

A proof-of-work solution like Anubis is the best we have currently. The least possible impact to end users, while keeping most (if not all) AI scrapers off the site.

load more comments (3 replies)
[–] RedSnt@feddit.dk 10 points 1 week ago
load more comments
view more: next ›