Programming

26127 readers

963 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

UlrikHD@programming.dev

bugsmith@programming.dev

Spyro@programming.dev

440

AI still doesn't work very well, businesses are faking it, and a reckoning is coming (www.theregister.com)

submitted 2 days ago by brianpeiris@lemmy.ca to c/programming@programming.dev

168 comments fedilink hide all child comments

Excerpt:

"Even within the coding, it's not working well," said Smiley. "I'll give you an example. Code can look right and pass the unit tests and still be wrong. The way you measure that is typically in benchmark tests. So a lot of these companies haven't engaged in a proper feedback loop to see what the impact of AI coding is on the outcomes they care about. Lines of code, number of [pull requests], these are liabilities. These are not measures of engineering excellence."

Measures of engineering excellence, said Smiley, include metrics like deployment frequency, lead time to production, change failure rate, mean time to restore, and incident severity. And we need a new set of metrics, he insists, to measure how AI affects engineering performance.

"We don't know what those are yet," he said.

One metric that might be helpful, he said, is measuring tokens burned to get to an approved pull request – a formally accepted change in software. That's the kind of thing that needs to be assessed to determine whether AI helps an organization's engineering practice.

To underscore the consequences of not having that kind of data, Smiley pointed to a recent attempt to rewrite SQLite in Rust using AI.

"It passed all the unit tests, the shape of the code looks right," he said. It's 3.7x more lines of code that performs 2,000 times worse than the actual SQLite. Two thousand times worse for a database is a non-viable product. It's a dumpster fire. Throw it away. All that money you spent on it is worthless."

All the optimism about using AI for coding, Smiley argues, comes from measuring the wrong things.

"Coding works if you measure lines of code and pull requests," he said. "Coding does not work if you measure quality and team performance. There's no evidence to suggest that that's moving in a positive direction."

you are viewing a single comment's thread
view the rest of the comments

[–] JuanPC@mastodon.online 1 points 13 hours ago (2 children)

@drmoose @brianpeiris . Where is the frontier of the reasonable? Maybe the most difficult thing to define.

[–] drmoose@lemmy.world 1 points 8 hours ago

You need to be a relatively knowledgeable engineer to understand what's reasonable. For example, Claude is incredible with React to the point where it's putting out all of web gui as a service platforms out of business however some rare languages or frameworks or even design principles are much harder for AI systems. I work right now with more creative side of programming like developing anti scam fingerprinting techniques and tbh LLM are only as good as a rubber duck for bouncing ideas which still is super useful but I'm in no delusion that this will replace my work.

[–] Honytawk@discuss.tchncs.de 2 points 12 hours ago* (last edited 12 hours ago)

Probably because "reasonable" is defined differently for every person.

Everyone thinks their worldview is the realistic one.