this post was submitted on 20 May 2026
55 points (98.2% liked)

Ask Lemmy

39660 readers
875 users here now

A Fediverse community for open-ended, thought provoking questions


Rules: (interactive)


1) Be nice and; have funDoxxing, trolling, sealioning, racism, toxicity and dog-whistling are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them


2) All posts must end with a '?'This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?


3) No spamPlease do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.


4) NSFW is okay, within reasonJust remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com. NSFW comments should be restricted to posts tagged [NSFW].


5) This is not a support community.
It is not a place for 'how do I?', type questions. If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.


6) No US Politics.
Please don't post about current US Politics. If you need to do this, try !politicaldiscussion@lemmy.world or !askusa@discuss.online


Reminder: The terms of service apply here too.

Partnered Communities:

Tech Support

No Stupid Questions

You Should Know

Reddit

Jokes

Ask Ouija


Logo design credit goes to: tubbadu


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] ellen.kimble@piefed.social 0 points 1 day ago (1 children)

You have to prompt for that, I do that regularly along with refactors. ‘Examine all tests to ensure they are testing functionality and not just passing a test.’ It finds them and will work on it. I think the problem continues to be engineering discipline. People are lazy with AI on multiple levels, not just copy pasta slop.

[–] CameronDev@programming.dev 1 points 1 day ago (2 children)

Testing functionality isn't the same as correctness.

[–] ellen.kimble@piefed.social 1 points 23 hours ago (1 children)

Oh excuse me then, what is correctness?

[–] CameronDev@programming.dev 1 points 20 hours ago (1 children)
int add(int a, int b) {
    return a + b;
}

This code is clearly functional, it'll compile and execute.

However, the customer actually needs the code to do a saturating add.

With that knowledge, we can clearly see that the code is not correct. It will not saturate, it will wrap around instead.


Without that knowledge, an LLM will happily write some basic unit tests that won't cover the saturation edge case, and the bug would live on until its hit in prod.

If you're lucky, and your function doco is good, the LLM might spot the bug, and notify you.

My personal preference for how to generate tests is to ask the agent to write specific tests. E.g: "write a test for add that demonstrates that it saturates".

[–] slevinkelevra@sh.itjust.works 1 points 14 hours ago (1 children)

IMO this is a bad example as in theory, testers test code against requirements, and if there is no such req stating anything about saturation then how should the testers or in this case the LLM know?

[–] CameronDev@programming.dev 1 points 13 hours ago

It is over simplified, but there are often implicit requirements that a human would be aware of from the broader context that the LLM may not be.

i.e add is used to increment a health bar, so wrap around doesn't make sense.

[–] slevinkelevra@sh.itjust.works 0 points 1 day ago (1 children)

Yeah, I had testers that tested the functionality of a delay... But had set the delay parameter to zero. Well good thing this one case worked, but you didn't check anything beyond that for correctness at all.

[–] CameronDev@programming.dev 1 points 23 hours ago

Timing and tests, name a better migraine duo :D.

We continuously create tests that ensure a process completes in an set amount of time, and every time, we don't give them enough leeway, and the test will fail randomly if the CI runner gets overloaded.