this post was submitted on 17 Mar 2026

431 points (98.0% liked)

Programming

26127 readers

925 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

UlrikHD@programming.dev

bugsmith@programming.dev

Spyro@programming.dev

431

AI still doesn't work very well, businesses are faking it, and a reckoning is coming (www.theregister.com)

submitted 1 day ago by brianpeiris@lemmy.ca to c/programming@programming.dev

167 comments fedilink hide all child comments

Excerpt:

"Even within the coding, it's not working well," said Smiley. "I'll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven't engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence."

Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.

"We don't know what those are yet," he said.

One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That's the kind of thing that needs to be assessed to determine whether AI helps an organization's engineering practice.

To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.

"It passed all the unit tests, the shape of the code looks right," he said. It's 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It's a dumpster fire. Throw it away. All that money you spent on it is worthless."

All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.

"Coding works if you measure lines of code and pull requests," he said. "Coding does not work if you measure quality and team performance. There's no evidence to suggest that that's moving in a positive direction."

top 50 comments

sorted by: hot top controversial new old

[–] motruck@lemmy.zip 4 points 4 hours ago

Hahaha. Im guessing this guy works in developer tools. These types of metrics are great but you rarely get there. You will get a few of them but the reality is the same people who want to use AI to produce faster are the same people that won't give you time to properly instrument your system for metrics like these. Good luck with your expectation that someone measures the impact of AI in a meaningful way.

[–] drmoose@lemmy.world 15 points 10 hours ago (2 children)

People delude themselves if they think LLMs are not useful for coding. People also delude themselves that all code will be AI written in the next 2 years. The reality is that it's incredibly useful tool but with reasonable limits.

[–] rothaine@lemmy.zip -1 points 3 hours ago

I think part of it is that it's been overhyped for so long. But now Opus can actually do all the shit we were promised 2 years ago.

[–] JuanPC@mastodon.online 1 points 8 hours ago (2 children)

@drmoose @brianpeiris . Where is the frontier of the reasonable? Maybe the most difficult thing to define.

[–] drmoose@lemmy.world 2 points 3 hours ago

You need to be a relatively knowledgeable engineer to understand what's reasonable. For example, Claude is incredible with React to the point where it's putting out all of web gui as a service platforms out of business however some rare languages or frameworks or even design principles are much harder for AI systems. I work right now with more creative side of programming like developing anti scam fingerprinting techniques and tbh LLM are only as good as a rubber duck for bouncing ideas which still is super useful but I'm in no delusion that this will replace my work.

[–] Honytawk@discuss.tchncs.de 2 points 8 hours ago* (last edited 8 hours ago)

Probably because "reasonable" is defined differently for every person.

Everyone thinks their worldview is the realistic one.

[–] BrightCandle@lemmy.world 7 points 16 hours ago (2 children)

I keep trying to use the various LLMs that people recommend for coding for various tasks and it doesn't just get things wrong. I have been doing quite a bit of embedded work recently and some of the designs it comes up with would cause electrical fires, its that bad. Where the earlier versions would be like "oh yes that is wrong let me correct it..." then often get it wrong again the new ones will confidently tell you that you are wrong. When you tell them it set on fire they just don't change.

I don't get it I feel like all these people claiming success with them are just not very discerning about the quality of the code it produces or worse just don't know any better.

[–] Fedizen@lemmy.world 2 points 3 hours ago

Lowkey I think anyone saying LLMs are useful for work is telling everyone around them their job is producing mostly low quality work and could reasonably be cut.

[–] Shayeta@feddit.org 1 points 10 hours ago (1 children)

It is possible to get good results, the problem is that you yourself need to have an very good understanding of the problem and how to solve it, and then accurately convey that to the AI.

Granted, I don't work on embedded and I'd imagine there's less code available for AI to train on than other fields.

[–] ironhydroxide@sh.itjust.works 2 points 7 hours ago

Yes, I definitely want to train a new hire who is superlatively confident that they are correct, while also having to do my job correctly as well, while said new hire keeps putting shit in my work.

[–] melsaskca@lemmy.ca 18 points 1 day ago

Businesses were failing even before AI. If I cannot eventually speak to a human on a telephone then the whole human layer is gone and I no longer want to do business with that entity.

[–] tomiant@piefed.social 1 points 18 hours ago

REPENT! The end is nigh!

...no but seriously

[–] Not_mikey@lemmy.dbzer0.com 24 points 1 day ago* (last edited 1 day ago) (1 children)

Guy selling ai coding platform says other AI coding platforms suck.

This just reads like a sales pitch rather than journalism. Not citing any studies just some anecdotes about what he hears "in the industry".

Half of it is:

You're measuring the wrong metrics for productivity, you should be using these new metrics that my AI coding platform does better on.

I know the AI hate is strong here but just because a company isn't pushing AI in the typical way doesn't mean they aren't trying to hype whatever they're selling up beyond reason. Nearly any tech CEO cannot be trusted, including this guy, because they're always trying to act like they can predict and make the future when they probably can't.

load more comments (1 replies)

[–] python@lemmy.world 18 points 1 day ago (9 children)

Recently had to call out a coworker for vibecoding all her unit tests. How did I know they were vibe coded? None of the tests had an assertion, so they literally couldn't fail.

[–] ch00f@lemmy.world 18 points 1 day ago (1 children)

Vibe coding guy wrote unit tests for our embedded project. Of course, the hardware peripherals aren’t available for unit tests on the dev machine/build server, so you sometimes have to write mock versions (like an “adc” function that just returns predetermined values in the format of the real analog-digital converter).

Claude wrote the tests and mock hardware so well that it forgot to include any actual code from the project. The test cases were just testing the mock hardware.

[–] 87Six@lemmy.zip 8 points 1 day ago

Not realizing that should be an instant firing. The dev didn't even glance a look at the unit tests...

load more comments (8 replies)

[–] Thorry@feddit.org 73 points 1 day ago (4 children)

Yeah these newer systems are crazy. The agent spawns a dozen subagents that all do some figuring out on the code base and the user request. Then those results get collated, then passed along to a new set of subagents that make the actual changes. Then there are agents that check stuff and tell the subagents to redo stuff or make changes. And then it gets a final check like unit tests, compilation etc. And then it's marked as done for the user. The amount of tokens this burns is crazy, but it gets them better results in the benchmarks, so it gets marketed as an improvement. In reality it's still fucking up all the damned time.

Coding with AI is like coding with a junior dev, who didn't pay attention in school, is high right now, doesn't learn and only listens half of the time. It fools people into thinking it's better, because it shits out code super fast. But the cognitive load is actually higher, because checking the code is much harder than coming up with it yourself. It's slower by far. If you are actually going faster, the quality is lacking.

[–] Shayeta@feddit.org 2 points 10 hours ago

It's like guiding a coked up junior who can write 5000 wpm, has read every piece of documentation ever without understanding any of it.

[–] merc@sh.itjust.works 4 points 23 hours ago

checking the code is much harder than coming up with it yourself

That's always been true. But, at least in the past when you were checking the code written by a junior dev, the kinds of mistakes they'd make were easy to spot and easy to predict.

LLMs are created in such a way that they produce code that genuinely looks perfect at first. It's stuff that's designed to blend in and look plausible. In the past you could look at something and say "oh, this is just reversing a linked list". Now, you have to go through line by line trying to see if the thing that looks 100% plausible actually contains a tiny twist that breaks everything.

[–] chunkystyles@sopuli.xyz 13 points 1 day ago (1 children)

This is very different from my experience, but I've purposely lagged behind in adoption and I often do things the slow way because I like programming and I don't want to get too lazy and dependent.

I just recently started using Claude Code CLI. With how I use it: asking it specific questions and often telling it exactly what files and lines to analyze, it feels more like taking to an extremely knowledgeable programmer who has very narrow context and often makes short-sighted decisions.

I find it super helpful in troubleshooting. But it also feels like a trap, because I can feel it gaining my trust and I know better than to trust it.

load more comments (1 replies)

[–] magiccupcake@lemmy.world 24 points 1 day ago

I love this bit especially

Insurers, he said, are already lobbying state-level insurance regulators to win a carve-out in business insurance liability policies so they are not obligated to cover AI-related workflows. "That kills the whole system," Deeks said. Smiley added: "The question here is if it's all so great, why are the insurance underwriters going to great lengths to prohibit coverage for these things? They're generally pretty good at risk profiling."

[–] jimmux@programming.dev 37 points 1 day ago (1 children)

We never figured out good software productivity metrics, and now we're supposed to come up with AI effectiveness metrics? Good luck with that.

[–] Senal@programming.dev 13 points 1 day ago (3 children)

Sure we did.

"Lines Of Code" is a good one, more code = more work so it must be good.

I recently had a run in with another good one : PR's/Dev/Month.

Not only it that one good for overall productivity, it's a way to weed out those unproductive devs who check in less often.

This one was so good, management decided to add it to the company wide catchup slides in a section espousing how the new AI driven systems brought this number up enough to be above other companies.

That means other companies are using it as well, so it must be good.

load more comments (3 replies)

[–] DickFiasco@sh.itjust.works 52 points 1 day ago (11 children)

AI is a solution in search of a problem. Why else would there be consultants to "help shepherd organizations towards an AI strategy"? Companies are looking to use AI out of fear of missing out, not because they need it.

[–] Honytawk@discuss.tchncs.de -1 points 7 hours ago* (last edited 7 hours ago)

Nah, it is more that LLMs are a neat technology that allows computers to generate stuff on their own. Which has all sort of uses. It has solved the problem of typing big texts on your own. (read: it did not solve the problem of reviewing big texts)

But it has also gaslit managers into thinking it can do much more than its capabilities, so they demand it to be put into everything. With disastrous results.

[–] ultimate_worrier@lemmy.dbzer0.com 27 points 1 day ago* (last edited 1 day ago)

Exactly. I’ve heard the phrase “falling behind” from many in upper management.

load more comments (9 replies)

[–] CubitOom@infosec.pub 33 points 1 day ago

Generative models, which many people call "AI", have a much higher catastrophic failure rate than we have been lead to believe. It cannot actually be used to replace humans, just as an inanimate object can't replace a parent.

Jobs aren't threatened by generative models. Jobs are threatened by a credit crunch due to high interest rates and a lack of lenders being able to adapt.

"AI" is a ruse, a useful excuse that helps make people want to invest, investors & economists OK with record job loss, and the general public more susceptible to data harvesting and surveillance.

load more comments