this post was submitted on 23 Feb 2026
577 points (97.5% liked)

Technology

81869 readers
4956 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Screenshot of this question was making the rounds last week. But this article covers testing against all the well-known models out there.

Also includes outtakes on the 'reasoning' models.

you are viewing a single comment's thread
view the rest of the comments
[–] rimu@piefed.social 134 points 2 days ago (7 children)

Very interesting that only 71% of humans got it right.

[–] SnotFlickerman@lemmy.blahaj.zone 127 points 2 days ago* (last edited 2 days ago) (2 children)

I mean, I've been saying this since LLMs were released.

We finally built a computer that is as unreliable and irrational as humans... which shouldn't be considered a good thing.

I'm under no illusion that LLMs are "thinking" in the same way that humans do, but god damn if they aren't almost exactly as erratic and irrational as the hairless apes whose thoughts they're trained on.

[–] Peekashoe@lemmy.wtf 28 points 2 days ago

Yeah, the article cites that as a control, but it's not at all surprising since "humanity by survey consensus" is accurate to how LLM weighting trained on random human outputs works.

It's impressive up to a point, but you wouldn't exactly want your answers to complex math operations or other specialized areas to track layperson human survey responses.

[–] MangoCats@feddit.it -4 points 2 days ago (1 children)

which shouldn’t be considered a good thing.

Good and bad is subjective and depends on your area of application.

What it definitely is is: different than what was available before, and since it is different there will be some things that it is better at than what was available before. And many things that it's much worse for.

Still, in the end, there is real power in diversity. Just don't use a sledgehammer to swipe-browse on your cellphone.

[–] Lost_My_Mind@lemmy.world 9 points 2 days ago

I asked Lars Ulrich to define good and bad. He said...

FIRE GOOD!!! NAPSTER BAD!!! OOOOH FIRE HOT!!! FIRE BAD!!! FIIIRRREEE BAAAAAAAD!!!!

[–] CaptDust@sh.itjust.works 44 points 2 days ago* (last edited 2 days ago)

That "30% of population = dipshits" statistic keeps rearing its ugly head.

[–] anomnom@sh.itjust.works 4 points 1 day ago

The same 29% that keeps fascists in power around the world.

[–] daychilde@lemmy.world 11 points 2 days ago (1 children)

I'm not afraid to say that it took me a sec. My brain went "short distance. Walk or drive?" and skipped over the car wash bit at first. Then I laughed because I quickly realized the idiocy. :shrug:

[–] theredhood@piefed.zip 2 points 1 hour ago

Me too, at first I was like "I don't want to walk 50 meters" then I was thinking "50 meters away from me or the car? And where is the car?" I didn't get it until I read the rest of the article...

[–] Lost_My_Mind@lemmy.world 9 points 2 days ago

As someone who takes public transportation to work, SOME people SHOULD be forced to walk through the car wash.

[–] LifeInMultipleChoice@lemmy.world -2 points 2 days ago* (last edited 2 days ago) (1 children)

Maybe 29% of people can't imagine owning their own car, so they assumed the would be going there to wash someone elses car

[–] Bronzebeard@lemmy.zip 4 points 1 day ago (1 children)

Then they can't read. Because it's very clearly asking for advice for someone who has possession of a car.

Yeah, it was a joke. People appear to have had a hard time with catching that though, lol