technology

24249 readers

427 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

founded 5 years ago

MODERATORS

context@hexbear.net

SexUnderSocialism@hexbear.net

gaycomputeruser@hexbear.net

Wakmrow@hexbear.net

SwitchyandWitchy@hexbear.net

121

Majority of CEOs Alarmed as AI Delivers No Financial Returns (futurism.com)

submitted 1 day ago by yogthos@lemmygrad.ml to c/technology@hexbear.net

49 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] yogthos@lemmygrad.ml 5 points 1 day ago (1 children)

For business customers per token costs might not be a deal breaker, but for anything consumer facing it's a really tough sell in my opinion. I do expect that the cost of running models is going to come down significantly in the near future though. There is a whole bunch of recent research that identify some key optimizations that can be made. Some of the ones I've found particularly interesting here:

Deepseek-mhc https://arxiv.org/abs/2512.24880
a graph-guided code generation framework that could solve the problem of keeping a repo’s structure in context https://arxiv.org/abs/2509.16198
Hierarchical Reasoning Model https://arxiv.org/abs/2506.21734
MemOS https://arxiv.org/abs/2507.03724 (https://github.com/MemTensor/MemOS)
https://github.com/BICLab/SpikingBrain-7B
German researchers achieved 71.6% on ARC-AGI using a regular GPU for 2 cents per task https://arxiv.org/abs/2505.07859
Affordable AI assistants with knowledge graph https://arxiv.org/abs/2504.02670 (https://github.com/spcl/knowledge-graph-of-thoughts)
Nested Learning: A new ML paradigm for continual learning https://abehrouz.github.io/files/NL.pdf
Continual Low-Rank Adaptation for Pre-trained Models https://arxiv.org/abs/2502.17920
Recursive language models https://arxiv.org/pdf/2512.24601
embeddings and context https://pub.sakana.ai/DroPE/
DeepSeek Engram paper https://github.com/deepseek-ai/Engram/blob/main/Engram_paper.pdf
a 32M parameter multi vector model outperforms 600M parameter models https://arxiv.org/abs/2601.08620

Once these ideas start getting integrated, I expect that we'll see much more capable models that can run on fairly cheap hardware. Even local models will likely be quite capable for a lot of tasks. And at that point running a model as a service and charging per token is going to be a dead end.

[–] darkmode@hexbear.net 2 points 1 day ago (1 children)

this is an incredible list of research. TYSM! In spare work time i have a small tool that tries to accomplish what #2 describes i have not clicked the link and read yet but now i will read everything

[–] yogthos@lemmygrad.ml 3 points 1 day ago (1 children)

I played around with implementing the recursive language model paper, and that actually turned out pretty well https://git.sr.ht/~yogthos/matryoshka

Basically, I spin up a js repl in a sandbox, and the agent can feed files into it, and then run commands against them. What normally happens is that the agent has to ingest the whole file into its context, but now it can just shove files into the repl, and then do operations on them akin to a db. And it can create variables. For example, if it searches for something in a file, it can bind the result to a variable and keep track of it. If it needs to filter the search later, it can just reference the variable it already made. This saves a huge amount of token use, and also helps the model stay more focused.

[–] darkmode@hexbear.net 1 points 18 hours ago (1 children)

about how large are the codebases you’ve used this rlm with

[–] yogthos@lemmygrad.ml 3 points 17 hours ago

Around around 10k lines or so. I use it as MCP that the agent uses when it decides it needs to. The whole code base doesn't get loaded in the repl, just individual files as it searches through them.