I use Cursor for work (Claude Code at home), and Cursor gives the option to select your model. I've dabbled a bit with GPT for the review of Claude code - haven't found anything dramatically better doing that than just Claude prompted to "wear the reviewer hat now."
MangoCats
The vulnerabilities were always there, one of the better uses of AI has been to find them.
I find that I get the best results when I develop a suite of documents in parallel with the code: requirements, architecture, designs, lessons learned, indexes into those documents, traceable ID tags on atomic, testable item descriptions. Development plans. When a new agent is introduced to the project, it can "get up to speed quickly" by jumping to the current working point on the development plan and indexing into all the relevant details in the other documents before even starting to read the existing code.
That working method itself is evolving, and each new LLM driven project builds on the previous successful projects' processes...
There was a time when nobody wrote unit tests, not so long ago, really.
WTF are you expecting Claude to code in bash?
I have found Sonnet and Opus to both be very capable in bash, but then, I don't usually ask bash to do super-complex things - its syntax is just too screwy to think about big applications in it.
I will say, you might be misguiding the LLM by filling it full of bad examples before starting. Kind of like the advice about not staring at a tree downslope while skiing, if you're fixated on it you're MORE likely to hit it.
I couldnt possibly deploy with any confidence a large project or honestly a small project I expected someone to rely on without layers of test.
In my world, that depends just about entirely upon how "dynamic" the code base is expected to be after release. We send a lot of things into the field, thousands of copies used for important work, which we pretty much know certain aspects of the system are unlikely to be changed once released. Others are very likely to be changed. "Back in the day" we'd make reasoned judgement calls about which ones would benefit from the effort of unit / integration testing and which ones that effort would be better invested elsewhere. As time marches on, our procedures and cross-departmental "advisors" who aren't so cozy with the code are relentlessly pushing for more and more automated testing. It is safer, no argument, but it also delays launch - sometimes without added value IMO.
The hassle is all on the agent, not on me.
So much this. That hassle on the agent, a few minutes of me waiting for it to crunch out the unit tests, saves me tons of hassle later - not going in circles re-fixing problems that were fixed before.
Same for keeping implementation code and documentation in sync - I've got hundreds of out-of-date wiki pages that simply aren't worth my time to fix. But when it's the agent keeping the docs in sync, just tell it to do it and wait a few minutes - totally worth the effort.
After I worked with AI agents a little, I dove in with a big set of coding standards and practices and... I overdid it. I find I get better results by starting off with a "light touch" and letting it do what it wants, then correcting where it gets off track (like using python for something that needs efficient performance...)
I've been using it rather heavily since about October of last year, I definitely do notice the models getting better, the tools around the models starting to do some things automatically that I had to manually prompt for last year (especially remembering key instructions). I also believe I am getting better at using them, how much that contributes to my overall results is extremely hard to quantify, but the feeling is definitely there. Like - last October I used to "just ask" for things without having a documented set of requirements. Today, I just know that the requirements document is necessary when the level of complexity is above... well, above a one-off simple example of how to do something relatively trivial.
using the right tools and giving them the right instructions.
The right tools is definitely key. Back an eternity ago, like October 2025, there was only Claude IMO if you wanted anything bigger than about a page of code. The others have come a long way - better than Claude was then, and I still feel like Claude is out in front, though by a less dramatic margin now.
As for "the right instructions" - I'd say it's more of "use the right process" which basically involves applying all those best practices that have developed over the past decades for human development, but we old farts from back before their time "don't need all that, it's a waste of time" because, basically, we internally practice most of the discipline without doing the documentation. With the AI tools: document your requirements, your architecture, tool choice selection process, designs, development plan, comment the code with traceability to why the code is being written, unit and integration tests, reviews, lessons learned, etc. etc. Having all that documentation kept with the project, well organized, is key to "bringing the AI agent up to speed" which you may be doing often. They really do demonstrate the eternal sunshine of the spotless mind, so if you have them take the time to write everything relevant down as they go (not just the code), then when a new one comes online it can jump into the middle of a development plan without repeating (as many) mistakes / making (as many) bad assumptions.
To be brutally honest, working with AI coding agents reminds me a LOT of working with overseas programmer consultants - if you don't get everything in writing you're gonna have a bad time.
In the late 1980s there was a time where we seriously weighed the option of hand assembly vs using compilers and hand assembly didn't always lose. In the early 1990s I wanted to use C++ but the available compiler for IBM compatible PCs was too buggy to be of value.
By the mid 1990s that had changed, good C compilers were exceeding all but the highest effort human assembly code - if you didn't like how it looked in assembly, you could much more easily "fix it" with a tweak to the C code instead of the assembly. I feel like we're sort of getting there with AI agent LLMs today - if you don't like what it provided, tell it why and let it try again - it's usually faster and easier and gets a better product for the time invested to use the tool instead of calling it a slop box and doing it yourself.
Yeah. I pay for Claude, my company pays even more for Cursor, so comparing them to free Gemini probably isn't fair.
Gemini is very useful for offhand queries while Claude is chewing on a bigger problem, but if it's something that needs complex analysis and/or extensive research... the tools that let you build up a folder full of files related to the task are vastly superior to chatbots. Gemini does have a Claude Code command line tool that does that kind of development in a folder, I didn't install it until last week. Gave it a coding problem to work on (lookup realtime weather radar data from NOAA, present recent data on a map on a webpage)... it sort of succeeded, but with poor user experience. Again, I'm in "Free mode" which can do quite a bit on a day's allowance of tokens, but... I don't feel like their paid modes would be particularly higher quality. If they are, they're doing themselves a tremendous disservice by demoing such substandard performance in free mode.