im so sleepy.
agents driven by language models (LMs) call functions to do stuff. Functions like these:
- read_file(path, from, to)
- write(path, content)
- list_fir(path, show_hidden = false)
- edit_file(path, old_string, new_string)
this is not a simplification btw.
so far, LMs were told to generate any of these formats to call a function:
- json, which sucks cuz of c-escape
- xml which sucks cuz of closing tags
- yaml which sucks cuz of multilinear strings being indented (sucks for LMs)
- just bash which sucks cuz of security
wow these all have problems with something, hm?
worst of all: we wanna save tokens wherever possible. so if an LM has to generate a full </parameter> for each argument in a function, that adds up quick
Introducing: my new format
wowie let's have a look at this format!
[write
path="file.txt"
content=
content of file here
horray newlines
no c escape! cool, i can regex all I want [\s\S]*
]
now isnt that simple?
- no string escape problems
- no xml-closing tags
- no json-brace-foolery
- no... | symbols for multilinear strings
now, of course, this is a new format. so language models suck at generating it, right?
WRONG
even a local 2-bit quant of qwen 3.6 35B-A3B aligned to it super easily.
and! even a dense Qwen3 4B model at Q4 quant worked with it flawlessly. I'm tired and need to sleep.
now congratulate me! say "horray wow ur such a genius ohmygod we are gonna save so many tokens and thus mΓΆney".
go, go head. im not gonna ask an LM to do it, that much is clear.
or, even better: tell me what SUCKS about this, im always open for critical feedback.
id rather be wrong than believe im right all the time.
Not a fan of LLMs but have you considered INI format? This is pretty similar. https://en.wikipedia.org/wiki/INI_file
thanks for sharing, but the goal here was not to make yet another key-value format, but to have natural feelingultiline-strings withinimal escaping.
thats why i settled for the code blocks: they tokenize well, are universally understoff as "this is some text", are easy to write and rarely ever have to deal with escape sequences.
in this case, the
[and]also serve as tool-call delimiters, which would usually be some heavy XMS ones<functions></functions>...thats 6 - 10 tokens down the drain for delimiters! >o< aaaaathank u for engaging with the post btw. i really appreciate it <3
if you are looking to embed code into free-flowing text output, you can go with a format such as this:
you write the commands with indentations (>) and basically ignore every other line. some languages do it that way. i think PHP uses opening tags to denote where code starts/ends, the rest is just ignored by the PHP interpreter.
I... think u didnt read my comment.
for me, the point was to have natural feeling multiline strings, but yesyes, if those are not a concern, ur format very much rules ~