science

27063 readers

332 users here now

A community to post scientific articles, news, and civil discussion.

dart board;; science bs

rule #1: be kind

lemmy.world rules

founded 2 years ago

MODERATORS

m3t00@lemmy.world

Joleee@lemmy.world

laverabe@lemmy.world

DeadPand@midwest.social

laverabe@lemmy.zip

Large-scale online deanonymization with LLMs (arxiv.org)

submitted 2 months ago* (last edited 2 months ago) by FactChecker@lemmy.world to c/science@lemmy.world

12 comments fedilink hide all child comments

We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.

top 12 comments

sorted by: hot top controversial new old

[–] Wigners_friend@piefed.social 20 points 2 months ago* (last edited 2 months ago) (1 children)

So they found something LLMs can actually do, and predictably it's incredibly evil.

[–] 4am@lemmy.zip 9 points 2 months ago (1 children)

This is why they are building huge data centers for something nobody actually wants.

[–] snooggums@piefed.world 0 points 1 month ago (1 children)

The authoritarians want it.

[–] 4am@lemmy.zip 1 points 1 month ago

I wish we could consider them nobodies

[–] theunknownmuncher@lemmy.world 11 points 2 months ago (1 children)

That’s why I always run my text through an LLM for refinement before posting it. 😉

[–] WesternInfidels@feddit.online 7 points 2 months ago (1 children)

"ChatGPT, please rewrite this inflammatory post full of libelous accusations in the style of my worst enemy."

[–] snooggums@piefed.world 1 points 1 month ago

spits out the text with no changes

[–] eleijeep@piefed.social 10 points 2 months ago (1 children)

The false positives are going to be tremendously dangerous to the falsely accused, as there is no way to either prove or disprove the charge. It's just a case of "computer says you're 99% likely to be the same person."

[–] WesternInfidels@feddit.online 3 points 2 months ago (1 children)

In a sane world, the computer's calculation would be used to guide a real investigation, which would find real evidence. The statistics wouldn't be used as proof of anything.

[–] snooggums@piefed.world 1 points 1 month ago

In a sane world we would not be deanonymizing anyone without a court order.

[–] lexiw@lemmy.world 6 points 2 months ago

None of what this does is new, the issue is that’s being commodified: in a short amount of time these techniques will be available to non experts in the field.

[–] WesternInfidels@feddit.online 2 points 2 months ago

I disguise my internet writings by doing them left handed.