A never-ending maze would mean the scrapers just hammer our servers forever. Better to lead them into a honeypot and automatically ban their IP. Like PieFed does.

[–] davidgro@lemmy.world 12 points 4 weeks ago (1 children)

What about a maze that adds a few hundred ms to the response time with each request, so the load gets less the longer it's trapped?

[–] rimu@piefed.social 10 points 4 weeks ago (1 children)

I haven't tried to make something like that. I think it'd be hard to do that without also exhausting our resources too.

[–] davidgro@lemmy.world 5 points 4 weeks ago

Ah, that makes sense

[–] TheBat@lemmy.world 8 points 4 weeks ago (1 children)

A never-ending maze would mean the scrapers just hammer our servers forever.

Is that how tarpitting works? I didn't know.

[–] rimu@piefed.social 18 points 4 weeks ago

There are a lot of strategies. afaik a tar pit tries to waste the attacker's resources by delaying our responses to their traffic? A honey pot tries to funnel bot traffic towards a place which only bots would go to. Once they go there you know they're a bot and they can be banned.

[–] Deceptichum@quokk.au 5 points 4 weeks ago

Sadly that only works for scrapers, content engaging bots are immune to it.

[–] pimento64@sopuli.xyz 2 points 3 weeks ago* (last edited 3 weeks ago)

So just find scrapers and bot farm owners IRL and burn down their houses, easy

[–] 4am@lemmy.zip 5 points 4 weeks ago

That’s the job of the web server, not of the application that runs on it.

There is already software you can get that feeds a never-ending maze of text to AI scrapers, some of which is AI generated and/or designed to poison LLM training. The problem is that these still use up a ton of bandwidth.

[–] FaceDeer@fedia.io 4 points 4 weeks ago

How would that layer distinguish AI from non-AI?

[–] mapto@feddit.bg 2 points 3 weeks ago

Fortunately AI is taking care of that on its own https://doi.org/10.1038/s41586-024-07566-y