this post was submitted on 06 Jun 2026
385 points (98.7% liked)
Technology
85181 readers
3666 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The business model should be that with economies of scale they could provide compute much cheaper than average consumer can buy to run locally. So yeah, that means they gotta be able to support these $20/mo plans indefinitely.
If they jack up the prices i can just buy a 128gb ryzen ai machine for the price of $200/mo claude for a year. I supposed there's some room there --they could charge $50/mo and it still makes sense.
but even at $100/mo i can buy a machine to run it at home do a 24 month payment plan and come out ahead.
That business model assumes that the huge cloud models will always maintain a gap worth paying for, compared to the local models. I'm just not convinced that the average consumer will need cloud models for summarizing their emails or the news of the day.
And for actual costs of their data centers, there literally aren't enough humans in the world where $20/month AI spending per person will help them break even. They'll need to sell big accounts (many businesses spending billions per year) in order to break even.
Local is potentially even cheaper than that. This guy talks about how to get 17 t/s with a GTX 1060 that has 6GB of VRAM on the Qwen 3.6 35B MoE model: https://m.youtube.com/watch?v=8F_5pdcD3HY. He’s using a fork of llama.cpp with TurboQuant and his newest video made after this one is using an even more optimized 28B version of the model. I have cmake building the llama.cpp fork in a Dockerfile at the moment and we’ll see how this performs on my $800 laptop with a RTX 4060.
I’m also impressed how good OpenCode is compared to Claude Code. Qwen 3.6 is not quite as good as Claude and the MoE version that doesn’t need 24GB+ of VRAM isn’t quite as good as the dense version, but it also doesn’t cost $200 a month with usage limitations and a company training their models on your data. If it’s anywhere near “good enough”, I can see this being a daily driver.
I'm not an expert but my understanding is most of the computation is in the training. The actual queries are not too difficult to manage. So I think that's what makes it more difficult to monetize because you're trying to position yourself as a digital gatekeeper for work that has already been done. Yes, some industries have survived in this position but it limits the amount of profit you can make because there are always ways to copy someone else's homework.