im so sleepy.
agents driven by language models (LMs) call functions to do stuff. Functions like these:
- read_file(path, from, to)
- write(path, content)
- list_fir(path, show_hidden = false)
- edit_file(path, old_string, new_string)
this is not a simplification btw.
so far, LMs were told to generate any of these formats to call a function:
- json, which sucks cuz of c-escape
- xml which sucks cuz of closing tags
- yaml which sucks cuz of multilinear strings being indented (sucks for LMs)
- just bash which sucks cuz of security
wow these all have problems with something, hm?
worst of all: we wanna save tokens wherever possible. so if an LM has to generate a full </parameter> for each argument in a function, that adds up quick
Introducing: my new format
wowie let's have a look at this format!
[write
path="file.txt"
content=
content of file here
horray newlines
no c escape! cool, i can regex all I want [\s\S]*
]
now isnt that simple?
- no string escape problems
- no xml-closing tags
- no json-brace-foolery
- no... | symbols for multilinear strings
now, of course, this is a new format. so language models suck at generating it, right?
WRONG
even a local 2-bit quant of qwen 3.6 35B-A3B aligned to it super easily.
and! even a dense Qwen3 4B model at Q4 quant worked with it flawlessly. I'm tired and need to sleep.
now congratulate me! say "horray wow ur such a genius ohmygod we are gonna save so many tokens and thus mΓΆney".
go, go head. im not gonna ask an LM to do it, that much is clear.
or, even better: tell me what SUCKS about this, im always open for critical feedback.
id rather be wrong than believe im right all the time.
for context, I downloaded qwen3:4b with ollama and it runs fast on my gpu. I wanna make tools for my LLM to be able to play Dwarf Fortress for me.
ohgosh playing a whole game is a whole different level of complexity >o<
it can work, but only under very short time horinzons and it will likely get stuck immediately trying to do the same thing. (im assuming you mean the text-based version)
see claude opus trying to play runescape here and skip to 6:55 where it starts. you will notice: its interesting to watch, but gets stuck quickly. and that model was SOTA at the time.
the model would also have difficulties telling what us where. sounds weird, but due to tokenization, it cant "infer" which characters are above which others. it will be guessing wrong a lot.
you would be babysitting the model.
right now, LMs and VLMs are being trained for computer use (where they click on a screen and use a computer "like a human would"), but exclusively for work tasks and not games.
try this first: run
ollama run qwen3:4b-instruct-2507-q4_K_M --experimental. that launches ollama in an "agent mode", where it can run shell commands, which essentially gives it one tool:bash(command:str).test it saying "yo what files and dies are in this directory?" and it will run
lsprobably and tell you.if you really - REALLY want it to work with the game, the best option would be:
ollama pull qwen3-vl:4b-instruct(or-thinking) so it can understand imagestake screenshot -> look at it -> take action on the game" (or some other mouse / keyboard usage command)theres no reason to use my format, its just a tolen-efficiency thing. with your fast GPU-throughput, this shouldnt be a problem.
ohgosh long response-
I specifically wanna use your format though!
then i have "bad" news.
i made thiup, it's not a standard.
meaning: you would have to write the entire agentic scaffolding yourself.
which honestly, is a fun project.
ive done it, its nice being able to see exactly what's goin on.
im sure ur super duper smart and already know this, but: all that an agent is, is just:
so umm... if thats what u feel like doing, good luck! π
that is a need, I'm gonna stop playing modded minecraft and work on this instead. it sounds like a lot of fun!