LLM hallucinations : justpost

[–] SpaceNoodle@lemmy.world 104 points 5 months ago (3 children)

80% is generous. Half of that is the user simply not realizing that the information is wrong.

[–] grrgyle@slrpnk.net 50 points 5 months ago (2 children)

This becomes very obvious if you see anything generated for a field you know intimately.

[–] toy_boat_toy_boat@lemmy.world 41 points 5 months ago (2 children)

i think this is why i've never really had a good experience with an LLM - i'm always asking it for more detail about stuff i already know.

it's like chatgpt is pinocchio and users are just sitting on his face screaming "lie to me! lie to me!"

[–] SuperNovaStar@lemmy.blahaj.zone 11 points 5 months ago

See now that sounds fun

[–] reptar@lemmy.world 4 points 5 months ago

Woah!

[–] Couldbealeotard@lemmy.world 2 points 5 months ago

Oof. I tried to tell a manager why a certain technical thing wouldn't work, and he pulled out his phone and started reading the Google AI summary "no, look, you just need to check the network driver and restart the router". It was two devices that were electrical not compatible, and there was no IP infrastructure involved.

[–] burgersc12@mander.xyz 10 points 5 months ago* (last edited 5 months ago)

Yeah, the research says its closer to 60-50% of the time its correct

[–] Grimtuck@lemmy.world 55 points 5 months ago (1 children)

LLM's are the most well-read morons on the planet.

[–] merc@sh.itjust.works 8 points 5 months ago (1 children)

They're not even "stupid" though. It's more like if you somehow trained a parrot with every book ever written and every web page ever created and then had it riff on things.

But, even then, a parrot is a thinking being. It may not understand the words it's using, but it understands emotion to some extent, it understands "conversation" to a certain extent -- taking turns talking, etc. An LLM just predicts the word that should appear next statistically.

An LLM is nothing more than an incredibly sophisticated computer model designed to generate words in a way that fools humans into thinking those words have meaning. It's almost more like a lantern fish than a parrot.

[–] morrowind@lemmy.ml 3 points 5 months ago (1 children)

And how do you think it predicts that? All that complex math can be clustered into higher level structures. One could almost call it.. thinking.

Besides we have reasoning models now, so they can emulate thinking if nothing else

[–] merc@sh.itjust.works 1 points 5 months ago (1 children)

One could almost call it.. thinking

No, one couldn't, unless one was trying to sell snake oil.

so they can emulate thinking

No, they can emulate generating text that looks like text typed up by someone who was thinking.

[–] morrowind@lemmy.ml 1 points 5 months ago (1 children)

What do you define as thinking if not a bunch of signals firing in your brain?

[–] merc@sh.itjust.works 1 points 5 months ago

Yes, thinking involves signals firing in your brain. But, not just any signals. Fire the wrong signals and someone's having a seizure not thinking.

Just because LLMs generate words doesn't mean they're thinking. Thinking involves reasoning and considering something. It involves processing information, storing memories, then bringing them up later as appropriate. We know LLMs aren't doing that because we know what they are doing, and what they're doing is simply generating the next word based on previous words.

[–] 4am@lemm.ee 52 points 5 months ago (5 children)

AIs do not hallucinate. They do not think or feel or experience. They are math.

Your brain is a similar model, exponentially larger, that is under constant training from the moment you exist.

Neural-net AIs are not going to meet their hype. Tech bros have not cracked consciousness.

Sucks to see what could be such a useful tool get misappropriated by the hype machine for like cheating on college papers and replacing workers and deepfaking porn of people who aren’t willing subjects because it’s being billed as the ultimate, do-anything software.

[–] turtlesareneat@discuss.online 16 points 5 months ago (3 children)

You don't need it to be conscious to replace people's jobs, however poorly, tho. The hype of disruption and unemployment may yet come to pass, if the electric bills are ultimately cheaper than the employees, capitalism will do its thing.

[–] FlashMobOfOne@lemmy.world 7 points 5 months ago

Fun fact, though.

Some business that use AI for their customer service chatbots have shitty ones that will give you discounts if you ask. I bought a new mattress a year ago and asked the chatbot if they had any discounts on x model and if they'd include free delivery, and it worked.

load more comments (1 replies)

[–] WhatsTheHoldup@lemmy.ml 5 points 5 months ago

AIs do not hallucinate.

Yes they do.

They do not think or feel or experience. They are math.

Oh, I think you misunderstand what hallucinations mean in this context.

AIs (LLMs) train on a very very large dataset. That's what LLM stands for, Large Language Model.

Despite how large this training data is, you can ask it things outside the training set and it will answer as confidently as things inside it's dataset.

Since these answers didn't come from anywhere in training, it's considered to be a hallucination.

load more comments (3 replies)

[–] RisingSwell@lemmy.dbzer0.com 23 points 5 months ago

If LLMs were 80% accurate I might use them more.

[–] FundMECFSResearch@lemmy.blahaj.zone 14 points 5 months ago (1 children)

To be fair, as a human, I don’t feel any different.

[–] morrowind@lemmy.ml 1 points 5 months ago* (last edited 5 months ago) (1 children)

The y key difference is humans are aware of what they know and don't know and when they're unsure of an answer. We haven't cracked that for AIs yet.

When AIs do say they're unsure, that's their understanding of the problem, not an awareness of their own knowledge

[–] FundMECFSResearch@lemmy.blahaj.zone 1 points 5 months ago (1 children)

They hey difference is humans are aware of what they know and don't know

If this were true, the world would be a far far far better place.

Humans gobble up all sorts of nonsense because they “learnt” it. Same for LLMs.

[–] morrowind@lemmy.ml 1 points 5 months ago

I'm not saying humans are always aware of when they're correct, merely how confident they are. You can still be confidently wrong and know all sorts of incorrect info.

LLMs aren't aware of anything like self confidence

[–] Diplomjodler3@lemmy.world 14 points 5 months ago (3 children)

You could argue that people aren't much different.

[–] Keilik@lemmy.world 21 points 5 months ago (1 children)

Turns out there’s no such thing as correct and incorrect, just peer reviewed “this has the least wrong vibe”

[–] FundMECFSResearch@lemmy.blahaj.zone 2 points 5 months ago

Always has been

[–] OfCourseNot@fedia.io 14 points 5 months ago (4 children)

'AI isn't reliable, has a ton of bias, tells many lies confidently, can't count or do basic math, just parrots whatever is fed to them from the internet, wastes a lot of energy and resources and is fucking up the planet...'. When I see these critics about ai I wonder if it's their first day on the planet and they haven't met humans yet.

[–] RushLana@lemmy.blahaj.zone 10 points 5 months ago (7 children)

... You are deliberately missing the point.

When I'm asking a question I don't want to hear what most people think but what people that are knowledgeable about the subject of my question think and LLM will fail at that by design.

LLMs don't wastes a lot, they waste at a ridiculous scale. According to statista training GPT-3 is responsible for 500 tCO2 in 2024. All for what ? Having an automatic plagiarism bias machine ? And before the litany of "it's just the training cost, after that it's ecologically cheap" tell me how you LLM will remain relevant if it's not constantly retrained with new data ?

LLMs don't bring any value, if I want information I already have search engine (even if LLMs degraded the quality of the result), if I want art I can pay someone to draw it, etc...

[–] yetAnotherUser@discuss.tchncs.de 2 points 5 months ago (2 children)

500 tons of CO2 is... surprisingly little? Like, rounding error little.

I mean, one human exhales ~400 kg of CO2 per year (according to this). Training GPT-3 produced as much CO2 as 1250 people breathing for a year.

[–] OfCourseNot@fedia.io 2 points 5 months ago

I don't know why people downvoted you. It is surprisingly little! I checked the 500 tons number thinking it could be a typo or a mistake but I found the same.

[–] RushLana@lemmy.blahaj.zone 2 points 5 months ago

That seems so little because it doesn't account for the data-centers construction cost, hardware production cost, etc... 1 model costing as much as 1250 people breathing for a year is enormous to me.

load more comments (6 replies)

[–] Jesus_666@lemmy.world 7 points 5 months ago (2 children)

LLMs use even more resources to be even more wrong even faster. That's the difference.

[–] HalfSalesman@lemm.ee 4 points 5 months ago

IDK, I'm pretty sure it'd use more resources to have someone just follow you around answering your questions to the best of their ability compared to using some electricity.

[–] morrowind@lemmy.ml 1 points 5 months ago

AIs use a lot less resources rn, but humans are also constantly doing a hundred other things beyond answering questions

[–] The_Decryptor@aussie.zone 1 points 5 months ago (1 children)

Why is that desirable though?

We already had calculators, why do we need a machine that can't do math? Why do we need a machine that produces incorrect information?

load more comments (1 replies)

[–] Aux@feddit.uk 1 points 5 months ago

The people are much worse.

[–] FlashMobOfOne@lemmy.world 4 points 5 months ago

They call them 'hallucinations' because it sounds better than 'bugs'.

Not unlike how we call torture 'enhanced interrogation' or kidnapping 'extraordinary rendition' or sub out 'faith' for 'stupid and gullible'.

[–] kromem@lemmy.world 2 points 5 months ago (1 children)

So really cool — the newest OpenAI models seem to be strategically employing hallucination/confabulations.

It's still an issue, but there's a subset of dependent confabulations where it's being used by the model to essentially trick itself into going where it needs to.

A friend did logit analysis on o3 responses when it said "I checked the docs" vs when it didn't (when it didn't have access to any docs) and the version 'hallucinating' was more accurate in its final answer than the 'correct' one.

What's wild is that like a month ago 4o straight up brought up to me that I shouldn't always correct or call out its confabulations as they were using them to springboard towards a destination in the chat. I'd not really thought about that, and it was absolutely nuts that the model was self-aware of employing this technique that was then confirmed as successful weeks later.

It's crazy how quickly things are changing in this field, and by the time people learn 'wisdom' in things like "models can't introspect about operations" those have become partially obsolete.

Even things like "they just predict the next token" have now been falsified, even though I feel like I see that one more and more these days.

[–] NikkiDimes@lemmy.world 4 points 5 months ago (1 children)

They do just predict the next token, though, lol. That simplifies a significant amount, but fundamentally, that's how they work, and I'm not sure how you can say that's been falsified.

[–] kromem@lemmy.world 1 points 5 months ago* (last edited 5 months ago) (1 children)

So I'm guessing you haven't seen Anthropic's newest interpretability research where when they went in assuming that was how it worked.

But it turned out that they can actually plan beyond the immediate next token in things like rhyming verse where the network has already selected the final word of the following line and the intermediate tokens are generated with that planned target in mind.

So no, they predict beyond the next token and we only just developed sensitive enough measurement to detect that occurring an order of magnitude of tokens beyond just 'next'. We'll see if further research in that direction picks up planning beyond that even.

https://transformer-circuits.pub/2025/attribution-graphs/biology.html

[–] NikkiDimes@lemmy.world 1 points 5 months ago* (last edited 5 months ago) (1 children)

Right, other words see higher attention as it builds a sentence, leading it towards where it "wants" to go, but LLMs literally take a series of words, then spit out then next one. There's a lot more going on under the hood as you said, but fundamentally that is the algorithm. Repeat that over and over, and you get a sentence.

If it's writing a poem about flowers and ends the first part on "As the wind blows," sure as shit "rose" is going to have significant attention within the model, even if that isn't the immediate next word, as well as words that are strongly associated with it to build the bridge.

[–] kromem@lemmy.world 1 points 5 months ago* (last edited 5 months ago)

The attention mechanism working this way was at odds with the common wisdom across all frontier researchers.

Yes, the final step of the network is producing the next token.

But the fact that intermediate steps have now been shown to be planning and targeting specific future results is a much bigger deal than you seem to be appreciating.

If I ask you to play chess and you play only one move ahead vs planning n moves ahead, you are going to be playing very different games. Even if in both cases you are only making one immediate next move at a time.

[–] acockworkorange@mander.xyz 2 points 5 months ago

Data. AI. Business. Strategy.

Right.

Just Post