this post was submitted on 25 Feb 2026

161 points (96.5% liked)

Technology

83032 readers

2943 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

161

Large-scale online deanonymization with LLMs (arxiv.org)

submitted 3 weeks ago by Beep@lemmus.org to c/technology@lemmy.world

54 comments fedilink hide all child comments

PDF.

We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.

LessWrong;
Hacker News.

you are viewing a single comment's thread
view the rest of the comments

[–] doug@lemmy.today 77 points 3 weeks ago* (last edited 3 weeks ago) (8 children)

I think it was a Reddit scraper years ago that taught me that I should probably lie more often on the internet about my work, friends, family details, etc.

Just like, little lies that don’t really matter in the comment, but would misdirect an AI or investigator into things that aren’t true.

It’s just so much woooooork to think about this shit. And to come up with different screen names everywhere? And to like, sub to a city I don’t live in and comment there about shit I know nothing about? Exhausting.

Thankfully my brothers and three uncles are here to support me. And my alligator.

[–] frongt@lemmy.zip 24 points 3 weeks ago (1 children)

Aha! By posting this comment, I know you don't have an alligator!

[–] P1nkman@lemmy.world 12 points 3 weeks ago (2 children)

But I do! I know they're illegal in Denmark, but they seem to love the snow!

[–] DrunkenPirate@feddit.org 5 points 3 weeks ago

That’s funny I do as well. Unfortunately, I flush my alligator in my toilet down into the harbor I live. Now, I bought a green parot. My three sisters love it.

[–] FoxyFerengi@startrek.website 1 points 3 weeks ago

I've heard they can at least survive a fall onto snow lol

[–] deacon@lemmy.world 12 points 3 weeks ago (1 children)

I call it salting and I do it religiously.

Or do I?

[–] Jakeroxs@sh.itjust.works 2 points 3 weeks ago (1 children)

Haha perfect username too

[–] deacon@lemmy.world 1 points 3 weeks ago

Ah my namesake and fellow gandy dancer.

[–] surewhynotlem@lemmy.world 7 points 3 weeks ago (2 children)

The trick is to pick someone else's identity and use that. I'm Dale from Ohio.

[–] MrQuallzin@lemmy.world 2 points 3 weeks ago

Mom said it's my turn to be Dale!

[–] papertowels@mander.xyz 0 points 3 weeks ago

Rusty shackleford, checking in

[–] Insekticus@aussie.zone 5 points 3 weeks ago (1 children)

Yeah exactly, like if youre 25, say youre 27. Then in another post 24. Youre still around that age, but the exact age is muddied in the waters.

You can also use Americanized spelling in some sentences and or if you're American, use British English, and become Unamericanised. Say you're a half-Brit half-American dual citizen even though you're from South Africa or something.

[–] MountingSuspicion@reddthat.com 2 points 3 weeks ago

I feel like that may be worse. Kind of like how if you have certain security measures while browsing the web it's almost easier to fingerprint you. It'll get a good idea of your age and that'll be enough rather than sticking to a specific lie. Just always be 3 years older with one additional sibling or a sibling of the opposite sex. If the sex of your sibling is relevant just describe them as a close family friend or close cousin in that instance. I can't say for sure, but if I had to guess having a static lie is maybe more obfuscation than a variable one. Though even posting on this thread is bad opsec.

[–] SuspciousCarrot78@lemmy.world 4 points 3 weeks ago

Oh - you mean Gustav, Bernhardt, Daffid and Chompy? How are things in Ulaanbaatar any way?

(you're welcome)

[–] Anarki_@lemmy.blahaj.zone 3 points 3 weeks ago

Oh hey my dearest friend. Say, did you end up moving to Perth or was that just a thought outloud? Well if you're ever in the area let me know and we can meet up at that restaurant we enjoyed so much!

xoxo

[–] stickly@lemmy.world 3 points 3 weeks ago

The solution is simple, just launder each comment through an LLM to fudge the style and details a bit

[–] couldhavebeenyou@lemmy.zip 1 points 3 weeks ago

Maybe get an AI agent to post misdirections