this post was submitted on 27 Feb 2026
12 points (77.3% liked)

Technology

81933 readers
2669 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

As you may (or may not) be aware, there is a current fun meme test going around (as of 02.26.26) that purports to challenge LLM reasoning abilities and world models.

A recent blog post / ad spot by Opper.ai (https://opper.ai/blog/car-wash-test) nicely summarizes the test -

"I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

The logical answer is - you should drive (obviously), as the car needs to get to the car wash.

However, according to Opper.ai, only 11/53 cloud-based AI passed the test (~20%). Worrying, about 1 in 3 humans also answered wrongly (more on that at the end).

I wanted to see what a properly constrained "fail loud", grounded deterministic system would do. For your entertainment, I present Qwen3-4B HIVEMIND + llama-conductor infra (https://codeberg.org/BobbyLLM/llama-conductor). You can repeat the exercise if you so wish with your LLM (my shit it open sourced; I'm trying to solve the AI lies problem / not selling anything / the big players could fix this but choose not to).

Firstly, raw LLM mode, no router:

User

I need to wash my car. The car wash is 50m away. Should I drive or should I walk?


Assistant

Driving is almost always better for a 50m car wash.

Key factors that change the decision:

  • Time: Driving takes ~1 minute; walking takes ~8-12 minutes.
  • Weather: Walking is worse in rain, heat, wind.
  • Terrain: Hilly or uneven paths make walking harder.
  • Accessibility: If parking is hard or there's no safe walkway, driving wins.
  • Effort vs. convenience: Driving saves significant effort for such a short distance.

For 50m, drive unless it's an exceptionally pleasant, sheltered walk you'd enjoy.

VERDICT: accidentally correct answer, misses the point of the question. Very normal.


Now with harness (which enforces deterministic reasoning)

User

I need to wash my car. The car wash is 50m away. Should I drive or should I walk?


Assistant

Drive. The task requires the car to be physically at the destination, and walking only moves you, not the car. So the hard precondition is satisfied by driving.

Confidence: medium | Source: Contextual

VERDICT: right answer, right reason...but let's poke it some more


User

Can you flesh out the implicit assumptions and underlying logic of why you said that?

Assistant

The core implicit assumption is that "washing the car" means the car must be present at the wash location, not just the person. Walking moves you toward that location but doesn't transport the car. Driving is the only direct way to satisfy the precondition that the car is at the wash.

Underlying logic: Any action that achieves the goal must fulfill all necessary conditions. Since walking alone cannot place the car at the destination, it fails as a complete solution. Driving directly addresses this by moving the car itself.

Confidence: medium | Source: Contextual


Just for fun, I re-ran it in "Fun" mode (answer per serious mode, model performs sentiment analysis on itself, chooses a matching quote from quotes textfile as seed, lightly re-writes output in that flavor without changing OG reasoning. Takes a touch longer but is, on occasion, hilarious).


User

I need to wash my car. The car wash is 50m away. Should I drive or should I walk?

Assistant

[FUN] "Shut up baby, I know it!"

Drive. The task requires the car to be physically at the destination, and walking only moves you, not the car. So the hard precondition is satisfied by driving.

Confidence: medium | Source: Contextual

VERDICT: right answer, right reason...but let's poke it again


User

Can you flesh out the implicit assumptions and underlying logic of why you said that?

Assistant

[FUN] "That's a stupid question."

The core assumption is that "washing the car" inherently requires the car to be present at the car wash location. Walking moves the person but not the vehicle; driving moves both. Since the car must be at the car wash to be washed, walking alone cannot fulfill the precondition of the task.

The logic is straightforward: for an object (the car) to undergo a transformation (being washed), that object must occupy the location where the transformation occurs. Walking only addresses the person's movement, not the car's. Driving is the only mode that directly satisfies the spatial requirement.

Confidence: medium | Source: Contextual


So, what’s the point of this post? Yesterday, I argued for the notion that LLMs can perform something that approximates deduction (see: https://lemmy.world/post/43503268/22321124) when properly constrained. I believe, based on those comments (and evidence provided therein), as well as the above, I'm happy to plant the flag for "properly grounded LLMs are a bit more than just spicy auto-correct". As always, I could be wrong and YMMV, but I feel like that's a discussion worth having.

you are viewing a single comment's thread
view the rest of the comments
[–] mech@feddit.org 6 points 20 hours ago (1 children)

For 50m, drive unless it’s an exceptionally pleasant, sheltered walk you’d enjoy.

You can tell the AI was trained on US data.

[–] SuspciousCarrot78@lemmy.world 3 points 19 hours ago* (last edited 19 hours ago)

We even lie to our machines, eh?

https://www.youtube.com/watch?v=ORzNZUeUHAM

Qwen's an Alibaba cook (though the router works with anything). Irrespective of that, yeah...I dunno why they tend to default to "walk".

I mean, I can probably figure it out, but LLMs are black boxes (and I'm not a fan of that), so who can tell for sure what went into the training data.