this post was submitted on 11 Jul 2025

104 points (98.1% liked)

Fuck AI

4823 readers

2107 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

104

Why is AI so wrong all the time??? (lemmy.today)

submitted 5 months ago by Outwit1294@lemmy.today to c/fuck_ai@lemmy.world

81 comments fedilink hide all child comments

There have been multiple things which have gone wrong with AI for me but these two pushed me over the brink. This is mainly about LLMs but other AI has also not been particularly helpful for me.

Case 1

I was trying to find the music video from where a screenshot was taken.

I provided o4 mini the image and asked it where it is from. It rejected it saying that it does not discuss private details. Fair enough. I told it that it is xyz artist. It then listed three of their popular music videos, neither of which was the correct answer to my question.

Then I started a new chat and described in detail what the screenshot was. It once again regurgitated similar things.

I gave up. I did a simple reverse image search and found the answer in 30 seconds.

Case 2

I wanted a way to create a spreadsheet for tracking investments which had xyz columns.

It did give me the correct columns and rows but the formulae for calculations were off. They were almost correct most of the time but almost correct is useless when working with money.

I gave up. I manually made the spreadsheet with all the required details.

Why are LLMs so wrong most of the time? Aren’t they processing high quality data from multiple sources? I just don’t understand the point of even making these softwares if all they can do is sound smart while being wrong.

you are viewing a single comment's thread
view the rest of the comments

[–] hedgehog@ttrpg.network 6 points 5 months ago (1 children)

LLM image processing doesn’t work the same way reverse image lookup does.

Tldr explanation: Multimodal LLMs turn pictures into a ~~thousand~~ 200-500 or so ~~words~~ tokens, but reverse image lookups create perceptual hashes of images and look the hash of your uploaded image up in a database.

Much longer explanation:

Multimodal LLMs (technically, LMMs - large multimodal models) use vision transformers to turn images into tokens. They use tokens for words, too, but these tokens don’t also correspond to words. There are multiple ways this could be implemented, but a common approach is to break the image down into a grid, then transform each “patch” of a specific size, e.g., 16x16, into a single token. The patches aren’t transformed individually - the whole image is processed together, in context - but it still comes out of it with basically 200 or so tokens that allow it to respond to the image, the same way it would respond to text.

Current vision transformers also struggle with spatial awareness. They embed basic positional data into the tokens but it’s fragile and unsophisticated when it comes to spatial awareness. Fortunately there’s a lot to explore in that area so I’m sure there will continue to be improvements.

One example improvement, beyond improved spatial embeddings, would be to use a dynamic vision transformers that’s dependent on the context, or that can re-evaluate an image based off new information. Outside the use of vision transformers, simply training LMMs to use other tools on images when appropriate can potentially help with many of LMM image processing’s current shortcomings.

Given all that, asking an LLM to find the album for you is like - assuming you’ve given it the ability and permission to search the web - like showing the image to someone with no context, then them to help you find what music video - that they’ve never seen, by an artist whose appearance they describe with 10-20 generic words, none of which are their name - it’s in, and to hope there were, and that they remembered, the specific details that would make it would come up in the top ten results if searched for on Google. That’s a convoluted way to say that it’s a hard task.

By contrast, reverse image lookup basically uses a perceptual hash generated for each image. It’s the tool that should be used for your particular problem, because it’s well suited for it. LLMs were the hammer and this problem was a torx screw.

Suggesting you use - or better, using a reverse image lookup tool itself - is what the LLM should do in this instance. But it would need to have been trained to think to suggest this, capable of using a tool that could do the lookup, and have both access and permission to do the lookup.

Here’s a paper that might help understand the gaps between LMMs and tasks built for that specific purpose: https://arxiv.org/html/2305.07895v7

[–] Outwit1294@lemmy.today 1 points 5 months ago (1 children)

So if I am understanding it, LLMs is not using the easier option of reverse image search because it is not aware of them?

[–] hedgehog@ttrpg.network 3 points 5 months ago (1 children)

It may be aware of them, but not in that context. If you asked it how to solve the problem rather than to solve the problem for you, there’s a chance it would suggest you use a reverse image search.

[–] Outwit1294@lemmy.today 1 points 5 months ago

But at that point, it is useful only for novice users of the internet who don’t know how to search for things. I am pretty sure a 30 second search engine search would yield the same result.