I've run it locally with Hermes but it is still not great and does really dumb things pretty often. I will say their 3.6 online chatbot is freaking amazing though and I use that over ChatGPT and Claude most of the time. Especially since it costs nothing.
Technology
A tech news sub for communists
I think the trick that's really underexplored right now is in leveraging the harness for the model. I'm playing around with an idea right now where I give a model a task and have it decide whether it thinks it can implement it in 30 lines or so, if not then it's asked to split the task up into subtasks, and so you end up building a tree of tasks that need to be completed. Once you get to the leaves you start implementing using TDD. If the model tries a few times and fails, then you ask it to split again. So, eventually it gets to the point where the task is small enough for it to do reliably. Once you build the leaves against a test, it's a small self contained piece of code that does a thing. Then you start bubbling up the tree where the node that did a split, now has to glue a few functions together, and work against a test.
My idea here is that we can aggressively control the context that the model has to think about. Any complex program can be broken down into smaller pieces with a function being the smallest unit of composition. So, if a model like Qwen can write around 100 lines of code reliably for most tasks, then that's really good enough to solve problems in general through decomposition.
Currently, people are trying to get LLMs to write code the way people do, and that's a mistake in my opinion. These things have different strength from a human developer, and we can make tooling around them that plays to those strength.
that sounds pretty rad.. how do you do it? is it just prompts all the way down?
It's a combination of prompts and rails for the model to follow. The tool keeps a code graph in memory which gets parsed using tree sitter, and then I use prolog to reason about the graph, like finding related nodes, seeing if node is connected to the rest of the graph, etc. So, the model ends up being called as the tool walks the graph mechanically, and when the implementor is invoked it basically works as it normally would. The tool offers it MCP like interface where the model can call functions to look at code, run tests, etc., but it never gets to run any system commands itself, it just works against the API the tool exposes to it.