this post was submitted on 03 Jun 2026
693 points (99.9% liked)

People Twitter

10031 readers
1410 users here now

People tweeting stuff. We allow tweets from anyone.

RULES:

  1. Mark NSFW content.
  2. No doxxing people.
  3. Must be a pic of the tweet or similar. No direct links to the tweet.
  4. No bullying or international politcs
  5. Be excellent to each other.
  6. Provide an archived link to the tweet (or similar) being shown if it's a major figure or a politician. Archive.is the best way.

founded 2 years ago
MODERATORS
693
Managers (media.piefed.zip)
submitted 1 day ago* (last edited 1 day ago) by inari@piefed.zip to c/whitepeopletwitter@sh.itjust.works
 
you are viewing a single comment's thread
view the rest of the comments
[–] Lysergid@lemmy.ml 30 points 1 day ago (4 children)

Honestly IDK why companies especially medium-big don’t do this. They could plug in RAG with internal/confidential data and have better results and security. I guess question is what is capital plus maintenance cost of running such infra for say 10k+ employees

[–] bountygiver@lemmy.ml 2 points 16 hours ago

Because the people selling the AI wants to make sure their customers don't know about this. It's all about causing a dependency so they get subscription income forever.

[–] Zos_Kia@jlai.lu 20 points 1 day ago (1 children)

I think the issue is also that you need some serious hardware to get good inference speed when your devs are working, but then most of the time this hardware will be under utilized.

That being said you can get good performance from indie inference farms, at a fraction of the cost of the big US labs. I think it's a great compromise and in a few months the open models will be near parity with opus 4.6 which is really all you need for most tasks.

[–] plyth@feddit.org 3 points 1 day ago (1 children)

opus 4.6 which is really all you need for most tasks.

The same tasks that can fit into 640KB.

[–] MalReynolds@slrpnk.net 9 points 1 day ago

Bigs definitely do, and anyone with confidential data should be.

[–] sobchak@programming.dev -1 points 1 day ago* (last edited 1 day ago) (1 children)

Probably more expensive than the subsidized costs. Hmm...

H100 GPUs cost $25k, and have 80GB of RAM. Kimi k2.6 has 1.1T parameters. Assuming 8 bit quantization, would need 14 GPUs to run a single agent at a time (I'm not sure the cloud models use quantization, so it could be double). So, $350k per vibecoding dev on GPUs alone. Life expectancy is ~4 years, so ~90k/year amortized. This is ignoring the significant electrical/HVAC cost of handling 10KW of electricity and heat per vibecoding dev (and tons of other costs as well).

[–] theunknownmuncher@lemmy.world 4 points 23 hours ago

per vibecoding dev

No lol. These same hardware requirements would apply to the cloud hosted models as well, so if that's how it worked, you're suggesting that Anthropic, OpenAI, Meta, and Google have purchased ~14 H100 GPUs per user that they serve???

That would be literally billions of GPUs, while it is estimated that in 2024, Google's AI division owned only 26,000 H100 GPUs and Meta owned the most H100 GPUs of any company at 350,000 units. These GPUs have very high throughput for inference and can serve many users, because that is exactly what they have been designed to do.

I’m not sure the cloud models use quantization

they absolutely do, yeah