Ask Lemmy
A Fediverse community for open-ended, thought provoking questions
Rules: (interactive)
1) Be nice and; have fun
Doxxing, trolling, sealioning, racism, toxicity and dog-whistling are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them
2) All posts must end with a '?'
This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?
3) No spam
Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.
4) NSFW is okay, within reason
Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com.
NSFW comments should be restricted to posts tagged [NSFW].
5) This is not a support community.
It is not a place for 'how do I?', type questions.
If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.
6) No US Politics.
Please don't post about current US Politics. If you need to do this, try !politicaldiscussion@lemmy.world or !askusa@discuss.online
Reminder: The terms of service apply here too.
Partnered Communities:
Logo design credit goes to: tubbadu
view the rest of the comments
Basically programmers are becoming designers and code reviewers.
There are now so many code changes that the code review is the bottleneck more than the coding.
We depend more now on tests to validate software does what it should. We get AI to write the gazillion tests.
And we get AI to summarize and explain blocks of code.
Are you new to engineering? Coding was never the bottleneck. Having someone who knows what to do, and planning usually is.
I spend 20-30% of my time at most actively coding. 30-40% helping people with dumb things that take my time when they shouldn't, and the remainder in meetings, scoping my work, or debugging.
And by my experience, most of that time is still not "typing in code".
As a benchmark, I was working in two greenfield projects, each for about five years. One was about a topic very close to my doctorate thesis in signal processing, one an embedded device for a large scientific experiment - with the systems engineers and the scientists as users sitting next door. So, in both cases, extremely well defined, far better than what an average programmer will ever see. Some colleagues worked on the JWST. That org knows damn well how to engineer stuff.
At the end of these five years, I had both times around 60,000 lines of code. Pretty productive.
Now, let's do the math: With about 250 work days per year, that is ..... 48 lines of code per day. You could type that in five minutes.
For old legacy projects, that average will be far far worse.
So, most of the productive time is actually thinking about code, and talking with people. And LLMs can't do that. They can only type. And worse, doing that they don't use the most important key for a programmer: The 'delete' key.
That is just the thing, developer and tester should never be the same person. Let alone same AI model. IMO testing is never taken seriously enough, just seen as unnecessary step and merged together with dev testing. From my years of experience I know that everything testers find is just explained away rather than properly adressed, and then with all of the obvious stuff in the way you never see the real issues.
Interested in how much actual experience you have with AI geneated testsuites.
My code was never tested this well.
I have experience with AI generated test suites, and while its good for generating coverage, it isn't so good for actually ensuring correctness, which is the actual point.
I've watched the robot happily introduce bugs to pass broken tests, and also break tests to match code, and everything in between.
I don't want lots of tests, I want good tests.
You have to prompt for that, I do that regularly along with refactors. ‘Examine all tests to ensure they are testing functionality and not just passing a test.’ It finds them and will work on it. I think the problem continues to be engineering discipline. People are lazy with AI on multiple levels, not just copy pasta slop.
Testing functionality isn't the same as correctness.
Oh excuse me then, what is correctness?
This code is clearly functional, it'll compile and execute.
However, the customer actually needs the code to do a saturating add.
With that knowledge, we can clearly see that the code is not correct. It will not saturate, it will wrap around instead.
Without that knowledge, an LLM will happily write some basic unit tests that won't cover the saturation edge case, and the bug would live on until its hit in prod.
If you're lucky, and your function doco is good, the LLM might spot the bug, and notify you.
My personal preference for how to generate tests is to ask the agent to write specific tests. E.g: "write a test for add that demonstrates that it saturates".
IMO this is a bad example as in theory, testers test code against requirements, and if there is no such req stating anything about saturation then how should the testers or in this case the LLM know?
It is over simplified, but there are often implicit requirements that a human would be aware of from the broader context that the LLM may not be.
i.e
addis used to increment a health bar, so wrap around doesn't make sense.Yeah, I had testers that tested the functionality of a delay... But had set the delay parameter to zero. Well good thing this one case worked, but you didn't check anything beyond that for correctness at all.
Timing and tests, name a better migraine duo :D.
We continuously create tests that ensure a process completes in an set amount of time, and every time, we don't give them enough leeway, and the test will fail randomly if the CI runner gets overloaded.