this post was submitted on 17 May 2026

2 points (100.0% liked)

Qwen

68 readers

3 users here now

A community all about the Qwens! (LLMs, VLMs, WANs...)

Here their blog page and their free chat interface

Post are allowed to have any format.

It is advised to put "Qwen" into the title somewhere.

Da Rules

please be nice <3 🧸
no bigotry or general evil-doings please! 💖
no politics 🌏❌
please don't make me add more rules <3

founded 1 year ago

MODERATORS

Smorty@lemmy.blahaj.zone

came up with the best tool-call format. need to sleep [OC, brainmade] (lemmy.blahaj.zone)

submitted 2 days ago by Smorty@lemmy.blahaj.zone to c/qwen@lemmy.blahaj.zone

9 comments fedilink hide all child comments

im so sleepy.

agents driven by language models (LMs) call functions to do stuff. Functions like these:

read_file(path, from, to)
write(path, content)
list_fir(path, show_hidden = false)
edit_file(path, old_string, new_string)

this is not a simplification btw.

so far, LMs were told to generate any of these formats to call a function:

json, which sucks cuz of c-escape
xml which sucks cuz of closing tags
yaml which sucks cuz of multilinear strings being indented (sucks for LMs)
just bash which sucks cuz of security

wow these all have problems with something, hm?

worst of all: we wanna save tokens wherever possible. so if an LM has to generate a full </parameter> for each argument in a function, that adds up quick

Introducing: my new format

wowie let's have a look at this format!

[write path="file.txt" content=

content of file here
horray newlines
no c escape! cool, i can regex all I want [\s\S]*

]

now isnt that simple?

no string escape problems
no xml-closing tags
no json-brace-foolery
no... | symbols for multilinear strings

now, of course, this is a new format. so language models suck at generating it, right?

WRONG

even a local 2-bit quant of qwen 3.6 35B-A3B aligned to it super easily.

and! even a dense Qwen3 4B model at Q4 quant worked with it flawlessly. I'm tired and need to sleep.

now congratulate me! say "horray wow ur such a genius ohmygod we are gonna save so many tokens and thus möney".

go, go head. im not gonna ask an LM to do it, that much is clear.

or, even better: tell me what SUCKS about this, im always open for critical feedback.

id rather be wrong than believe im right all the time.

you are viewing a single comment's thread
view the rest of the comments

[–] anothercatgirl@lemmy.blahaj.zone 1 points 1 day ago (5 children)

I want to use this in my project! how do I apply it in practice?

[–] anothercatgirl@lemmy.blahaj.zone 0 points 1 day ago (4 children)

for context, I downloaded qwen3:4b with ollama and it runs fast on my gpu. I wanna make tools for my LLM to be able to play Dwarf Fortress for me.

[–] Smorty@lemmy.blahaj.zone 2 points 1 day ago (1 children)

ohgosh playing a whole game is a whole different level of complexity >o<

it can work, but only under very short time horinzons and it will likely get stuck immediately trying to do the same thing. (im assuming you mean the text-based version)

see claude opus trying to play runescape here and skip to 6:55 where it starts. you will notice: its interesting to watch, but gets stuck quickly. and that model was SOTA at the time.

the model would also have difficulties telling what us where. sounds weird, but due to tokenization, it cant "infer" which characters are above which others. it will be guessing wrong a lot.

you would be babysitting the model.

right now, LMs and VLMs are being trained for computer use (where they click on a screen and use a computer "like a human would"), but exclusively for work tasks and not games.

try this first: run ollama run qwen3:4b-instruct-2507-q4_K_M --experimental. that launches ollama in an "agent mode", where it can run shell commands, which essentially gives it one tool: bash(command:str).

test it saying "yo what files and dies are in this directory?" and it will run ls probably and tell you.

if you really - REALLY want it to work with the game, the best option would be:

download the VL version of the model ollama pull qwen3-vl:4b-instruct (or -thinking) so it can understand images
making some script which saves a screenshot of the game to a file
put the qwen into some pre-built agent, like opencode or goose (which has GUI)
tell it "start dwarf fortress as a background process, use this script to take screenshots, look at them and use xdotool to navigate the game. Perform the loop of take screenshot -> look at it -> take action on the game" (or some other mouse / keyboard usage command)

theres no reason to use my format, its just a tolen-efficiency thing. with your fast GPU-throughput, this shouldnt be a problem.

ohgosh long response-

[–] anothercatgirl@lemmy.blahaj.zone 1 points 1 day ago (1 children)

I specifically wanna use your format though!

[–] Smorty@lemmy.blahaj.zone 1 points 18 hours ago (1 children)

then i have "bad" news.

i made thiup, it's not a standard.

meaning: you would have to write the entire agentic scaffolding yourself.

which honestly, is a fun project.
ive done it, its nice being able to see exactly what's goin on.

im sure ur super duper smart and already know this, but: all that an agent is, is just:

parsing the tokens as they come in
notice when the LM writes a tool call
parse it after its done
execute
return the output

so umm... if thats what u feel like doing, good luck! 💖

[–] anothercatgirl@lemmy.blahaj.zone 1 points 16 hours ago

that is a need, I'm gonna stop playing modded minecraft and work on this instead. it sounds like a lot of fun!

load more comments (2 replies)