this post was submitted on 26 Apr 2026
199 points (97.2% liked)

Technology

84322 readers
4586 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] partofthevoice@lemmy.zip 4 points 1 week ago* (last edited 1 week ago)

They aren’t using LLMs to do the spying.

LLMs function because we have a technology now that can operate in a space of extremely high mathematical abstraction. Just consider for a moment what you do know about LLMs. They’re trained on massive amounts of text, while fundamentally they operate by predicting the next token (or, word) in a sequence (or, sentence).

An LLM is what you get when you use this method of information processing on natural language.

What if you instead train it on fingerprinting user identities based on web behavior? It doesn’t even output language in this case, now being a different tool operating on the same fundamental information processing methodology.

What if you train a system to automate semantic analysis, which is much simpler than an LLM? Give it categories like “leftist activist” and see what kind of lists they can garner after processing the likes, shares, replies, views, … of every Reddit user that has ever existed? What if you then cross associate users via writing styles, so they can roughly patch up your old Reddit with your new Lemmy — or maybe even your really old Facebook with your old Reddit? What if they further augment that with ISP data that helps really drive these points home?

What if they don’t need tens of thousands of analysts to do this kind of thing for every single American citizen, anymore? Something previously seen as intractable and not worthy of consideration outside conspiracists, now might only require a large enough data center. Surely it doesn’t require a data center with a ballroom on top, but that’s more architectural than anything else.

Edit: let me be more clear about something. LLMs don’t predict the truth. LLMs predict the next token. That being said, they do a really damn good job. Hallucinations are a problem with alignment of that good-job to our expectation of truth — a different issue. So, when you consider the effectiveness of their “spying technology” — do so by comparing it to an LLMs ability to “sound right,” not “be right.”