I might try this out next week. Tired of burning my monthly token allowance in Cursor in a couple weeks. :D
Technology
A tech news sub for communists
Yep, it works on my machine. ๐
I'll compare it with the 3B qwen3.6 next week
Deepseek v4 pro! Top up your credit as you go and they're having a sale until May 31st, but even without the sale 1M output tokens is "only" 3.48. Flash is only 0.28 per 1M output.
Not sure if I could swing Deepseek at my job tho. Surprisingly, Cursor still comes with Kimi2 as model option, so there's that.
If you have the memory, I can highly recommend Qwen3.6-35B-A3B-Q8. It's hands down the best local model I've tried. It only loads 3b params in memory too, so should run with 16gb, or you can drop to a lower quant too.
I think I tried qwen3.6 but the 8B version, and that tanked my 16GB. But I'll give the smaller one a shot!
i saw a comparison of the 8b model vs the dense 30b (iirc) dense model and it was almost the same... the 30b was slightly better on most tests but only barely
It's honestly incredible to see because 8b is getting to the point where it will run well on a lot of consumer hardware. If we can get current frontier performance at that size, then you really would be able to solve most tasks locally.
The 4-bit quantized GGUF for granite 4.1 is sub 5GB, so it's probably going to run on any modern machine even if it's not particularly built for Vram... 6 gigs is what I had on my old 1080 gpu.
https://huggingface.co/unsloth/granite-4.1-8b-GGUF/tree/main
๐