this post was submitted on 23 Mar 2025
302 points (98.4% liked)

Technology

67422 readers
4131 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
top 28 comments
sorted by: hot top controversial new old
[–] ToadOfHypnosis@lemm.ee 58 points 2 days ago (3 children)

So AI taxes power, water for cooling, and other natural resources to be ramped up and used. Now this creates a second wasteful AI to do the same and create an endless loop so that the first AI just keeps spinning its wheels and wasting resources until discovered. The idea makes sense from a pure “stop unauthorized crawling” perspective, but damn we just have no solutions that don’t accelerate climate impact. This planet is just going to turn into an oven to cook us.

[–] floofloof@lemmy.ca 16 points 2 days ago* (last edited 2 days ago) (2 children)

"No real human would go four links deep into a maze of AI-generated nonsense," Cloudflare explains. "Any visitor that does is very likely to be a bot, so this gives us a brand-new tool to identify and fingerprint bad bots."

It sounds like there may be a plan to block known bots once they have used this tool to identify them. Over time this would reduce the amount of AI slop they need to generate for the AI trap, since bots already fingerprinted would not be served it. Since AI generators are expensive to run, it would be in Cloudflare's interests to do this. So while your concern is well placed, in this particular case there may be a surge of energy and water usage at first that tails off once more bots are fingerprinted.

[–] rottingleaf@lemmy.world 7 points 2 days ago (1 children)

“No real human would go four links deep into a maze of AI-generated nonsense,”

Looking for porn me with red eyes swearing at the screen.

[–] singletona@lemmy.world 2 points 1 day ago

...real.

'Four links deep'

HEY NOW! Sometimes stuff just gets interesting!

'Into a maze of AI-Generated Nonsense.'

And sometimes that interesting is porn related!

[–] turmacar@lemmy.world 5 points 2 days ago* (last edited 1 day ago)

The problem being they're now attempting anti-fingerprinting tactics. A lot of the AI crawlers used to identify themselves as Amazon/openAI/etc. And aren't anymore because they were being blocked. Now they're coming from random IPs with random/obfuscated agent ids.

This is a legal problem not a technological one.

[–] piecat@lemmy.world 3 points 2 days ago

It's definitely an arms race. One other outcome is that it gets too expensive to be cost effective and slows down that way.

[–] rottingleaf@lemmy.world 3 points 2 days ago* (last edited 2 days ago) (1 children)

There are solutions. I've just read (diagonally) a paper on attacks on Kademlia. The solutions would be similar to what's recommended there. The problems are in appearances different, but stem from no admission control for the network.

All this tomfoolery about "oh horror, how do we solve this" is because bot farms and recommendation systems and ad networks have proven very convenient and profitable, nobody wants to scratch that ecosystem in favor of f2f services. So they want to remove one side of the coin, but leave the other.

[–] SL3wvmnas@discuss.tchncs.de 0 points 13 hours ago (1 children)

Oooh, that sounds like an interesting read. Do you happen to have the DOI?

[–] rottingleaf@lemmy.world 2 points 13 hours ago (1 children)
[–] SL3wvmnas@discuss.tchncs.de 0 points 3 hours ago* (last edited 3 hours ago)

Thank you for taking the time!

[–] RejZoR@lemmy.ml 61 points 2 days ago (5 children)

This is Ai poisoning. Blocking it you just make it not learn. Feeding it bullshit poisons its knowledge making it hallucinate.

I also wonder how Ai crawlers know what wasn't already generated by Ai, potentially "inbreeding" knowledge as I call it with Ai hallucinations of the past.

When whole Ai craze began, everything online was human made basically. Not anymore now. It'll just get worse if you ask me.

[–] CheeseNoodle@lemmy.world 29 points 2 days ago (1 children)

The scary part is even humans don't really have a proper escape mechanism for this kind of misinformation. Sure we can spot AI a lot of the time but there are also situations where we can't and it kind of leaves us only trusting people we already knew before AI, and being more and more distrustful of information in general.

[–] theangryseal@lemmy.world 11 points 2 days ago (1 children)

Holy shit, this.

I’m constantly worried that what I’m seeing/hearing is fake. It’s going to get harder and harder to find older information on the internet too.

Shit, it’s crept outside of the internet actually. Family buys my kids books for Christmas and birthdays and I’m checking to make sure they aren’t AI garbage before I ever let them look at it because someone bought them an AI book already without realizing it.

I don’t really understand what we hope to get from all of this. I mean, not really. Maybe if it gets to a point where it can truly be trusted, I just don’t see how.

[–] Flagstaff@programming.dev 2 points 2 days ago

I don’t really understand what we hope to get from all of this.

Well, even among the most moral devs, the garbage output wasn't intended, and no one could have predicted the pace at which it's been developing. So all this is driving a real need for in-person communities and regular contact—which is at least one great result, I think.

[–] JustARegularNerd@lemmy.dbzer0.com 18 points 2 days ago (1 children)

Kind of. They're actually trying to avoid this according to the article:

"The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts—such as neutral information about biology, physics, or mathematics—to avoid spreading misinformation (whether this approach effectively prevents misinformation, however, remains unproven)."

[–] Muaddib@sopuli.xyz 5 points 2 days ago

That sucks! What's the point of putting an AI in a maze if you're not going to poison it?

[–] count_dongulus@lemmy.world 4 points 2 days ago

Whoa I never considered AI inbreeding as a death for AI 🤔

[–] floofloof@lemmy.ca 2 points 2 days ago

Some of these LLMs introduce very subtle statistical patterns into their output so it can be recognized as such. So it is possible in principle (not sure how computationally feasible when crawling) to avoid ingesting whatever has these patterns. But there will also be plenty of AI content that is not deliberately marked in this way, which would be harder to filter out.

[–] Flisty@mstdn.social 2 points 2 days ago

@RejZoR @floofloof yeah AI will get worse and worse the more it trains on its own output. I can only see "walled-garden" AIs trained on specific datasets for specific industries being useful in future. These enormous "we can do everything (we can't do anything)" LLMs will die a death.

[–] sundrei@lemmy.sdf.org 20 points 2 days ago (1 children)

endless maze of irrelevant facts

oh on I've been turned into an AI :(

[–] Ilovethebomb@lemm.ee 27 points 2 days ago

Feeding AI crawlers the excrement of their forebears is a perfect way to deal with them.

[–] lol_idk@lemmy.ml 17 points 2 days ago

Throwing more power resources at a resource hungry process seems like a no win

[–] lath@lemmy.world 11 points 2 days ago

So they grasped the inevitable and dove right into it.

[–] kokesh@lemmy.world 8 points 2 days ago

I really want to see what the bullshit looks like - shame the article doesn't actually show a sample, guess I'd have to make my browser look like an AI crawler

This is so cyberpunk.

Heh, sounds like what one of my exes used to do when she wanted some alone time, she'd throw me an informational rabbit hole and let me dive right in it for a couple of hours=)))