this post was submitted on 06 Jun 2026
415 points (98.6% liked)
Technology
85181 readers
3652 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Local is potentially even cheaper than that. This guy talks about how to get 17 t/s with a GTX 1060 that has 6GB of VRAM on the Qwen 3.6 35B MoE model: https://m.youtube.com/watch?v=8F_5pdcD3HY. He’s using a fork of llama.cpp with TurboQuant and his newest video made after this one is using an even more optimized 28B version of the model. I have cmake building the llama.cpp fork in a Dockerfile at the moment and we’ll see how this performs on my $800 laptop with a RTX 4060.
I’m also impressed how good OpenCode is compared to Claude Code. Qwen 3.6 is not quite as good as Claude and the MoE version that doesn’t need 24GB+ of VRAM isn’t quite as good as the dense version, but it also doesn’t cost $200 a month with usage limitations and a company training their models on your data. If it’s anywhere near “good enough”, I can see this being a daily driver.