this post was submitted on 16 Feb 2026
121 points (100.0% liked)

technology

24249 readers
427 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] yogthos@lemmygrad.ml 5 points 1 day ago (1 children)

For business customers per token costs might not be a deal breaker, but for anything consumer facing it's a really tough sell in my opinion. I do expect that the cost of running models is going to come down significantly in the near future though. There is a whole bunch of recent research that identify some key optimizations that can be made. Some of the ones I've found particularly interesting here:

Once these ideas start getting integrated, I expect that we'll see much more capable models that can run on fairly cheap hardware. Even local models will likely be quite capable for a lot of tasks. And at that point running a model as a service and charging per token is going to be a dead end.

[–] darkmode@hexbear.net 2 points 1 day ago (1 children)

this is an incredible list of research. TYSM! In spare work time i have a small tool that tries to accomplish what #2 describes i have not clicked the link and read yet but now i will read everything

[–] yogthos@lemmygrad.ml 3 points 1 day ago (1 children)

I played around with implementing the recursive language model paper, and that actually turned out pretty well https://git.sr.ht/~yogthos/matryoshka

Basically, I spin up a js repl in a sandbox, and the agent can feed files into it, and then run commands against them. What normally happens is that the agent has to ingest the whole file into its context, but now it can just shove files into the repl, and then do operations on them akin to a db. And it can create variables. For example, if it searches for something in a file, it can bind the result to a variable and keep track of it. If it needs to filter the search later, it can just reference the variable it already made. This saves a huge amount of token use, and also helps the model stay more focused.

[–] darkmode@hexbear.net 1 points 18 hours ago (1 children)

about how large are the codebases you’ve used this rlm with

[–] yogthos@lemmygrad.ml 3 points 17 hours ago

Around around 10k lines or so. I use it as MCP that the agent uses when it decides it needs to. The whole code base doesn't get loaded in the repl, just individual files as it searches through them.