Back when Sonnet was 200K and Opus was 1M, there were a lot of complex programming projects where I actually got better overall results out of Sonnet... but, go back to the 3.x days and Sonnet got stuck in debug loops fairly often where Opus would break out of the loop and find a working solution more often.
MangoCats
This sounds like a real application for Doctrow's "Chuffie" from Down and Out in the Magic Kingdom... scan the "how's my driving?" QR code on the lid of the bot and downvote... company's bots gets enough downvotes it gets restricted out of the places it's receiving the downvotes from.
These are the kind of things that could run on a lane built over the sidewalk...
expected to tip the fucking robot?
I think that's more commonly like a sheep.
The world really needs transparency like that - tip your cook, tip the farmer that grew the amazing tomatoes, or stiff 'em all but at least know who they really were. Before placing your order, get a readout of where the fish was caught, and when... identify how close the chocolate's cocoa was grown to toxic lead contamination sites - before you take it off the shelf.
After you tip it on its back...
I haven't tried lately, several months ago I tried asking the chatbots directly: What's the size of your context window. Gemini answered straight out: "32,767 tokens, and that's not as good for developing complex software as a larger context window like Claude Sonnet's 200,000 tokens."
General intelligence means human-level intelligence.
Applicable quote from a Yosemite park ranger: "There is considerable overlap between the intelligence of the smartest bears and the dumbest tourists."
We’re not changing the definitions. People thought that chess is so hard... That just turned out to be a false assumption
Sounds like changing definitions to me.
A system can be superhuman at playing chess but that ability doesn’t need to translate to any other field.
AlphaZero has been superhuman at playing chess, Go, and basically any game with perfect information and fixed rules since early 2018. That's translatable to other fields - it's not as strong in some fields as it is in playing Go, but it's still translatable ability...
In the 1980s I worked at a factory where the joke was "why are coffee breaks only 15 minutes long?" "Because when we give 'em 30 minutes for lunch we have to retrain them before they start work again after lunch."
Pretty amazing what PiHole reveals. My work laptop constantly chatters with sites all over the world, it's the biggest source of DNS lookup in my home network while it's on, and it's competing with six people browsing the web for their various purposes...
This is the power of specifications. With a specification, the LLM can create a test to ensure what it creates meets the specification. Then the agent can iterate on solution after solution until its proposed solution meets all your specifications. Then you get to discover how many holes you left open in your specifications.
Another thing AI agents based on LLMs aren't half bad at: reviewing specifications for completeness and readiness for implementation. They'll go ahead and fill in the blanks for you if you ask them to, but if you ask them instead to point out the holes then you can decide what should be done instead of "yeah, just make mine like all the other ones you have seen in your training."
That would be my future, if you make me king. Since the offers for absolute unflinching loyalty are a bit thin on the ground for the past couple of decades, I'll assume it's probably not the future we're getting.