TechTakes

2205 readers

323 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems

122

a handy list of LLM poisoners (tldr.nettime.org)

submitted 8 months ago by dgerard@awful.systems to c/techtakes@awful.systems

8 comments fedilink hide all child comments

top 8 comments

sorted by: hot top controversial new old

[–] BlueMonday1984@awful.systems 11 points 8 months ago

I do feel like active anti-scraping measures could go somewhat further, though - the obvious route in my eyes would be to try to actively feed complete garbage to scrapers instead - whether by sticking a bunch of garbage on webpages to mislead scrapers or by trying to prompt inject the shit out of the AIs themselves.

Me, predicting how anti-scraping efforts would evolve

(I have nothing more to add, I just find this whole development pretty vindicating)

[–] Gigliorananomicom@sh.itjust.works 11 points 8 months ago

Doing God's work 🙏

[–] arsCynic@beehaw.org 6 points 8 months ago (2 children)

Stupidly trivial question probably, but I guess it isn't possible to poison LLMs on static websites hosted on GitHub?

[–] 2kool4idkwhat 8 points 8 months ago

You can make a page filled with gibberish and have a display: none honeypot link to it inside your other pages. Not sure how effective would that be though

[–] dashdsrdash@awful.systems 6 points 8 months ago

Sure, but then you have to generate all that crap and store it with them. Preumably Github will eventually decide that you are wasting their space and bandwidth and... no, never mind, they're Microsoft now. Competence isn't in their vocabulary.

[–] rook@awful.systems 6 points 8 months ago (1 children)

Additionally, https://xeiaso.net/blog/2025/anubis/

Some of this stuff could be conceivably implemented as an easy-to-consume service. It would be nice if it were possible to fend off the scrapers without needing to be a sysadmin or, say, a cloudflare customer.

(Whilst I could be either of those things, unless someone is paying me I would very much rather not)

[–] homesweethomeMrL@lemmy.world 5 points 8 months ago

A WP plugin would be handy.

[–] o7___o7@awful.systems 6 points 8 months ago* (last edited 8 months ago)

The kids are going through an Adventure Time phase, and so I am reminded of this:

https://www.youtube.com/embed/IbZJ1PeFLGU?start=33&end=70