this post was submitted on 11 Mar 2026
55 points (98.2% liked)

Technology

82518 readers
3942 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Evaluating 35 open-weight models across three context lengths (32K, 128K, 200K), four temperatures, and three hardware platforms—consuming 172 billion tokens across more than 4,000 runs—we find that the answer is “substantially, and unavoidably.” Even under optimal conditions—best model, best temperature, temperature chosen specifically to minimize fabrication—the floor is non-zero and rises steeply with context length. At 32K, the best model (GLM 4.5) fabricates 1.19% of answers, top-tier models fabricate 5–7%, and the median model fabricates roughly 25%.

you are viewing a single comment's thread
view the rest of the comments
[–] snooggums@piefed.world 6 points 11 hours ago (2 children)

Aka being wrong, but with a fancy name!

When Cletus is wrong because he mixed up a dog and a cat when deacribing their behavior do we call it hallucinating? No.

[–] Scipitie@lemmy.dbzer0.com 11 points 11 hours ago (1 children)

Accepting concepts like "right" and "wrong" gives those tools way too much credit, basically following the AI narrative of the corporations behind them. They can only be used about the output but not the tool itself.

To be precise:

LLMs can't be right or wrong because the way they work has no link to any reality - it's stochastics, not evaluation. I also don't like the term halluzination for the same reason. It's simply a too high temperature setting jumping into a closeby but unrelated vector set.

Why this is an important distinction: Arguing that an LLM is wrong is arguing on the ground of ChatGPT and the likes: It's then a "oh but wen make them better!" And their marketing departments overjoy.

To take your calculator analogy: like these tools do have floating point errors which are inherent to those tools wrong outputs are a dore part of LLMs.

We can minimize that but then they automatically use part of their function. This limitation is way stronger on LLMs than limiting a calculator to 16 digits after the comma though...

[–] CubitOom@infosec.pub 3 points 11 hours ago* (last edited 11 hours ago) (2 children)

What word would you propose to use instead?

Fabrication?

[–] leftzero@lemmy.dbzer0.com 1 points 6 hours ago

Scam. We're being sold an autocomplete tool as a search engine.

Or fraud, since some of the same companies destroyed the functionality of their search engines in order to make the autocomplete look better in comparison.

[–] Scipitie@lemmy.dbzer0.com 5 points 11 hours ago (1 children)

That's my problem: any single word humanizes the tool in my opinion. Iperhaps something like "stochastic debris" comes close but there's no chance to counter the common force of pop culture, Corp speak a and humanities talent to see humanoid behavior everywhere but each other. :(

[–] Telorand@reddthat.com 2 points 10 hours ago

We do enjoy pareidolia, don't we?

[–] bad1080@piefed.social 2 points 11 hours ago

if you have a lobby you get special names, look at the pharma industry who coined the term "discontinuation syndrome" for a simple "withdrawal"