this post was submitted on 23 Feb 2026
112 points (100.0% liked)

Slop.

801 readers
393 users here now

For posting all the anonymous reactionary bullshit that you can't post anywhere else.

Rule 1: All posts must include links to the subject matter, and no identifying information should be redacted.

Rule 2: If your source is a reactionary website, please use archive.is instead of linking directly.

Rule 3: No sectarianism.

Rule 4: TERF/SWERFs Not Welcome

Rule 5: No bigotry of any kind, including ironic bigotry.

Rule 6: Do not post fellow hexbears.

Rule 7: Do not individually target federated instances' admins or moderators.

founded 1 year ago
MODERATORS
 

Thanks capitalism for doing the stupidest implementation of this technology possible

you are viewing a single comment's thread
view the rest of the comments
[–] MeetMeAtTheMovies@hexbear.net 8 points 5 hours ago (4 children)

Do you need open weight models for this? Any recommendations for which one(s)? Do you need to download a huggingface client or something? I’m familiar with AI stuff, but not running locally.

[–] LaughingLion@hexbear.net 1 points 53 minutes ago

If you are running at home as a hobby then just use Koboldcpp and maybe SillyTavern if you want extra functionality. In the former you can offload down and potentionally up tensors to save VRAM space if needed. For models it depends on need.

A 24-31B model is generally more than fine for most @home use cases, and they are quite "smart", though that doesn't mean anything in regards to AI. It's a vibe, basically. A 32gb RAM / 8GB VRAM can use a 24B model to generate about 5 tokens per second, which is fine for an agent who is designed to give you short replies to answer questions.

You'll most likely want to grab a GGUF quantization from hugging face, yes. Any 4bit quant is fine, really. The merges are all quasi-abliterated models for people who want slutty AI girlfriends/boyfriends. The models directly from companies like GLM7 or Kimi or whatever are more standard and generally run more efficiently.

People in development are likely going to want a 70-100B model. Claude, I think, is a 100B model. You can run those on about 64gb of ram and 32gb of VRAM.

If you want settings for Koboldcpp I can give you the rundown on how to optimize.

[–] corgiwithalaptop@hexbear.net 3 points 3 hours ago* (last edited 3 hours ago)

huggingface client

This is what billionaires want us to take seriously. Let me just fire up the slimslam bazooper and have it connect to the sillynilly butterball API

You'd download ollama, and then get the model via ollama install I believe. Although the 90% as good mark is pushing it, as open weight models below 32b parameters (what you could reasonably run on those machines) benchmark around 40% less than Opus 4.6 for software, and the difference is night and day for general reasoning.

[–] BountifulEggnog@hexbear.net 2 points 3 hours ago

It depends massively on what hardware you have. I've heard good things about glm 4.7 flash and it's easy enough to run. Also depends on what you want to use it for.