this post was submitted on 23 Feb 2026
584 points (97.6% liked)
Technology
82830 readers
4796 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
They probably added a system guardrail as soon as they heard about this test. it's been going around for a while now :)
Article mentions that Gemini 2.0 Flash Lite, Gemini 3 Flash and Gemini 3 Pro have passed the test. All these 3 also did it 10 out of 10 times without being wrong. Even Gemini 2.5 shares highest score in the category of "below 6 right answers". Guess, Gemini is the closest to "intelligence" out of a bunch.
I mean if they fix specific reasoning test answers (like the strawberry one) this doesn't actually make reasoning better tho. It just optimizes for benchmarks
I'm pretty sure Google's AI is fed by the same spider that goes out and finds every new or changed web page (or a variant of that).
As soon as someone writes an article about how AI gets something wrong and provides a solution, that solution is now in the AI's training data.
OTOH, that means it's probably also ingesting a lot of AI generated slop, which causes its own set of problems.