this post was submitted on 30 Dec 2025

842 points (98.6% liked)

Microblog Memes

10341 readers

2669 users here now

A place to share screenshots of Microblog posts, whether from Mastodon, tumblr, ~~Twitter~~ X, KBin, Threads or elsewhere.

Created as an evolution of White People Twitter and other tweet-capture subreddits.

RULES:

Your post must be a screen capture of a microblog-type post that includes the UI of the site it came from, preferably also including the avatar and username of the original poster. Including relevant comments made to the original post is encouraged.
Your post, included comments, or your title/comment should include some kind of commentary or remark on the subject of the screen capture. Your title must include at least one word relevant to your post.
You are encouraged to provide a link back to the source of your screen capture in the body of your post.
Current politics and news are allowed, but discouraged. There MUST be some kind of human commentary/reaction included (either by the original poster or you). Just news articles or headlines will be deleted.
Doctored posts/images and AI are allowed, but discouraged. You MUST indicate this in your post (even if you didn't originally know). If a post is found to be fabricated or edited in any way and it is not properly labeled, it will be deleted.
Be nice. Take political debates to the appropriate communities. Take personal disagreements to private messages.
No advertising, brand promotion, or guerrilla marketing.

Related communities:

founded 2 years ago

MODERATORS

ReadyUser31@lemmy.world

aeronmelon@lemmy.world

needanke@feddit.org

842

La_Brea_V2.0.exe (slrpnk.net)

submitted 1 month ago by Track_Shovel@slrpnk.net to c/microblogmemes@lemmy.world

64 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] sharkfucker420@lemmy.ml 120 points 1 month ago (1 children)

Hell yeah, that rules

[–] Klear@quokk.au 44 points 1 month ago

Cyberpunk as fuck.

[–] undefined@lemmy.hogru.ch 109 points 1 month ago* (last edited 1 month ago) (2 children)

Here’s the article and here’s the software.

[–] A_norny_mousse@feddit.org 25 points 1 month ago

You can take a look at what this looks like, here. (Note: VERY slow page loads!)

[–] FishFace@piefed.social 10 points 1 month ago (1 children)

Nowhere in the article or start of the readme did I find how this works. How does it differentiate between a human visitor and a scraper?

[–] StarvingMartist@sh.itjust.works 6 points 1 month ago

Probably invisible links A human would stop clicking too once they see garbage

[–] Demdaru@lemmy.world 89 points 1 month ago (2 children)

I am so confused by the low link lol.

"AI haters build tarpits to trap and trick AI (!)" - Ohmy god poor AI :<
"...that ignore robots.txt!" - ...oh, so illegal AI...?
"Attackers explain-" - YEAH! THE EVIL AGRESSIVE
"how anti-spamdefense became an AI weapon" - ...folk trying to defend from spam...?

FFS they try to paint people protecting themselves as evil but are keeping facts too much and it becomes an absolute confusing mess xD

[–] skulblaka@sh.itjust.works 35 points 1 month ago (4 children)

It's not really that confusing.

The software equivalent of armed masked men are illegally breaking in to your personal property, stealing everything that isn't nailed down and ripping all the nails out of everything that is, and then leaving with it in order to reuse it for personal profit. It is, in all ways, similar to a home invasion. These invaders are then telling you that you're a bad person because you don't want them invading your property and stealing all your shit.

Its highly illegal, everyone involved with it knows for a fact that it's highly illegal, so they best they can do is try and spin propaganda around it because nobody has the balls to try and arrest Sam Altman, et al about it.

If you pick the lock on my front door and enter my home without permission I am going to put a 12 gauge slug through your solar plexus. If I could do the same to an AI crawler I would.

[–] Pieisawesome@lemmy.dbzer0.com 18 points 1 month ago (3 children)

This is a terrible analogy.

First off, robots.txt has no force of law. It’s just a curtesy. You are free to ignore it (except where prohibited by EULA or contract).

Secondly, this is more similar to a supermarket hanging a sign that you can only access 3 of their 11 aisles.

What this is doing is if you try to access the 7 aisles they requested you not to use, you have to solve a math problem or two.

Ai scrapers are obnoxious loud drunk people who take way more than their fair share.

If you truly have something private (like your house) you should not expose it publically on the internet.

[–] skulblaka@sh.itjust.works 21 points 1 month ago (1 children)

Well, let's turn this situation around then and see how it changes.

I hammer Meta's backend services with 6.8m requests per second, ignoring all posted guidelines, absorbing all the data I can get my hands on from them and feeding it to my machine which is busy trying to build BaseFook based on Meta's data that I've harvested from them.

Criminal DDOS? What's that?

Unauthorized access to backend systems? Nah, we'll be fine, that's definitely legal.

....

It is currently true that robots.txt doesn't have legal teeth and relies on voluntary compliance, but there have been court cases involving it in the past, and in my opinion they should have resulted in an established legal precedent. Check these out (courtesy of Wikipedia:)

The robots.txt played a role in the 1999 legal case of eBay v. Bidder's Edge,[12] where eBay attempted to block a bot that did not comply with robots.txt, and in May 2000 a court ordered the company operating the bot to stop crawling eBay's servers using any automatic means, by legal injunction on the basis of trespassing.[13][14][12] Bidder's Edge appealed the ruling, but agreed in March 2001 to drop the appeal, pay an undisclosed amount to eBay, and stop accessing eBay's auction information.[15][16]

In 2007 Healthcare Advocates v. Harding, a company was sued for accessing protected web pages archived via The Wayback Machine, despite robots.txt rules denying those pages from the archive. A Pennsylvania court ruled "in this situation, the robots.txt file qualifies as a technological measure" under the DMCA. Due to a malfunction at Internet Archive, Harding could temporarly access these pages from the archive and thus the court found "the Harding firm did not circumvent the protective measure".[17][18][19]

In 2013 Associated Press v. Meltwater U.S. Holdings, Inc. the Associated Press sued Meltwater for copyright infringement and misappropriation over copying of AP news items. Meltwater claimed that they did not require a license and that it was fair use, because the content was freely available and not protected by robots.txt. The court decided in March 2013 that "Meltwater’s copying is not protected by the fair use doctrine", mentioning among several factors that "failure […] to employ the robots.txt protocol did not give Meltwater […] license to copy and publish AP content".[20]

[–] SkyezOpen@lemmy.world 13 points 1 month ago

The critical difference that determines whether or not it's illegal is how many lawyers the site owner has.

[–] bless@lemmy.ml 5 points 1 month ago

Beware of dog signs also have no enforcement of law.

But apart from that, if your crawler ends up stuck in an endless loop, that's poor coding on your part. Human beings won't browse a static website endlessly, neither should a crawler

load more comments (1 replies)

[–] MonkderVierte@lemmy.zip 6 points 1 month ago* (last edited 1 month ago)

More like clogging the entry to your exhibition for making copies of your licensed produce, no?

[–] Amberskin@europe.pub 5 points 4 weeks ago

There is a way to stop the IA Crawlers, but it involves using dynamite and a hard risk of landing in prison.

load more comments (1 replies)

[–] gressen@lemmy.zip 55 points 1 month ago (1 children)

Here is a demo for anyone interested. It's deliberately slow to load.

https://zadzmo.org/nepenthes-demo/

[–] SlurpingPus@lemmy.world 9 points 1 month ago* (last edited 1 month ago) (1 children)

It's deliberately slow to load

That kinda defeats the goal of feeding AI as much garbage as possible. They will just fetch a page from a different site in that time, instead of spending cycles on this page. It's not like the crawler works strictly serially.

[–] gressen@lemmy.zip 38 points 1 month ago* (last edited 1 month ago) (1 children)

The idea is to protect own server from unnecessary loads. You're welcome to provide a faster AI tar pit, just mind that ultimately this is a waste of resources.

[–] SlurpingPus@lemmy.world 8 points 1 month ago* (last edited 1 month ago)

I'm guessing that Markov chains are pretty efficient computationally compared to AI training. Don't have a site currently, but I'd love to see a bot rip through hundreds of pages a minute.

[–] pigup@lemmy.world 40 points 1 month ago (1 children)

Please, someone make us super easy to implement version of this.

[–] mushroommunk@lemmy.today 43 points 1 month ago

It exists: https://zadzmo.org/code/nepenthes/

[–] Formfiller@lemmy.world 38 points 1 month ago (1 children)

How can we help

[–] Saprophyte@lemmy.world 16 points 1 month ago

Kyle Hill has an amazing video on how...

https://youtu.be/vC2mlCtuJiU

[–] _thebrain_@sh.itjust.works 37 points 1 month ago (2 children)

I wonder how effective they are. When I first heard about ssh targets (like endlessh) I thought it was an awesome idea. But as I started to look at some analyzed logged data it turns out they are either slightly effective to not at all effective. If simple logic can be written so a dumb ssh bot programed to find vulnerable ssh servers can easily avoid a tar pit, I would think it is pretty trivial for an AI crawler to do the same thing. I am interested to see some analyzed data on something like this after several months on the open internet.

[–] tempest@lemmy.ca 20 points 1 month ago

The reality is that depending on the crawling architecture someone is watching.

As aggressive as the LLM crawlers are there still have limits so a competently written one will have a budget for each host/site as well as a heuristic for the quality of results. It may dig for a bit and periodically return but if you're site is not one that is known to generate high quality data it may only get crawled when there isn't something better in the queue.

[–] Saprophyte@lemmy.world 7 points 1 month ago* (last edited 1 month ago)

Super effective, I tried both of these on a couple of domains I have and the amount of hits they get vs how long crawlers stay in them is insane. I use the AI robot.txt file and if they ignore it will spend hours scraping randomized nonsense text from unlimited internal links. I'm sure large legit ai companies have protection, but I get a lot of traffic from Africa and Asia in particular. Not sure if it's the source or a VPN, but I just look at the geos and tend not to dig deep.

[–] GeneralEmergency@lemmy.world 35 points 1 month ago (5 children)

Büt whāt æbœùt typīñg lîke thìß?

[–] whelk@retrolemmy.com 29 points 1 month ago (1 children)

You didn't use the thorn!

load more comments (1 replies)

[–] PearOfJudes@lemmy.ml 21 points 1 month ago

I read this like static over a radio

[–] boonhet@sopuli.xyz 11 points 4 weeks ago (2 children)

Appreciate you using the ß correctly instead of using it as a replacement for "B"

load more comments (2 replies)

[–] VoterFrog@lemmy.world 4 points 4 weeks ago (1 children)

Doesn't work either

The text you provided translates to:
"But what about typing like this?". This style of writing involves replacing standard Latin letters with similar-looking characters from other alphabets or adding diacritical marks (accents, tildes, umlauts) available in the Unicode standard.

load more comments (1 replies)

[–] shalafi@lemmy.world 27 points 1 month ago (2 children)

Seems like these traps would be trivially easy to defeat. I should get off my ass and run one, see how it goes.

[–] Krudler@lemmy.world 7 points 1 month ago (1 children)

Agree. This is another revenge fantasy from people that think the idea is great, without understanding that the implementation part is where it's gonna break down.

[–] VoterFrog@lemmy.world 6 points 4 weeks ago (1 children)

Yeah, much like the thorn, LLMs are more than capable of recognizing when they're being fed Markov gibberish. Try it yourself. I asked one to summarize a bunch of keyboard auto complete junk.

The provided text appears to be incoherent, resembling a string of predictive text auto-complete suggestions or a corrupted speech-to-text transcription. Because it lacks a logical grammatical structure or a clear narrative, it cannot be summarized in the traditional sense.

I've tried the same with posts with the thorn in it and it'll explain that the person writing the post is being cheeky - and still successfully summarizes the information. These aren't real techniques for LLM poisoning.

[–] trashgirlfriend@lemmy.world 15 points 4 weeks ago (1 children)

this is for poisoning the training data, not the input into the generative model

load more comments (1 replies)

[–] scrubbles@poptalk.scrubbles.tech 27 points 1 month ago (3 children)

Now, is this a docker image I can run?

[–] undefined@lemmy.hogru.ch 29 points 1 month ago

Yes

[–] zbyte64@awful.systems 10 points 1 month ago

https://zadzmo.org/downloads/nepenthes/docker

load more comments (1 replies)

[–] laranis@lemmy.zip 22 points 4 weeks ago (1 children)

Soon: US Republicans introduce law to prohibit the use of AI tar pits; cite copyright law and freedom of speech.

[–] bigfondue@lemmy.world 14 points 4 weeks ago (1 children)

They will cite the Magna Carta and Articles of Confederation.

load more comments (1 replies)

[–] FosterMolasses@leminal.space 20 points 1 month ago (2 children)

It's weird how this is written to make them sound more like animals or insects than a computer algorithm... "thrash around" lol

[–] Lauchmelder@feddit.org 21 points 1 month ago

I understood it as "thrashing" in a computer science sense. When RAM fills up and the kernel has to swap aggressively, that's called thrashing

[–] wolframhydroxide@sh.itjust.works 4 points 4 weeks ago

It's because the tool is named Nepenthes, after the pitcher plants, into which victims fall, cannot escape, and thrash around until they die and are digested.

[–] Microtonal_Banana@lemmy.zip 19 points 1 month ago

Rage Against the Machine.

[–] ShyFae@piefed.blahaj.zone 17 points 1 month ago

Finally! Some good fucking news!

[–] desmosthenes@lemmy.world 15 points 1 month ago

not a bad idea, might have to get on this bandwagon

[–] Track_Shovel@slrpnk.net 13 points 1 month ago (1 children)

@Five@slrpnk.net @ProdigalFrog@slrpnk.net @poVoq@slrpnk.net

[–] poVoq@slrpnk.net 9 points 1 month ago

I started experimenting with it a while ago, but I am currenty busy with other things.

[–] lmmarsano@lemmynsfw.com 8 points 4 weeks ago

Post needs accessibility.

Previous posts on same topic

That's excluding all the posts like OP's lacking searchable text due to inaccessibility.

[–] supersquirrel@sopuli.xyz 6 points 4 weeks ago

Hell yes! This is exactly what I have wanted to happen.

[–] sexy_peach@feddit.org 5 points 1 month ago

That's so cool

load more comments