Do you need open weight models for this? Any recommendations for which one(s)? Do you need to download a huggingface client or something? I’m familiar with AI stuff, but not running locally.

[–] LaughingLion@hexbear.net 1 points 53 minutes ago

If you are running at home as a hobby then just use Koboldcpp and maybe SillyTavern if you want extra functionality. In the former you can offload down and potentionally up tensors to save VRAM space if needed. For models it depends on need.

A 24-31B model is generally more than fine for most @home use cases, and they are quite "smart", though that doesn't mean anything in regards to AI. It's a vibe, basically. A 32gb RAM / 8GB VRAM can use a 24B model to generate about 5 tokens per second, which is fine for an agent who is designed to give you short replies to answer questions.

You'll most likely want to grab a GGUF quantization from hugging face, yes. Any 4bit quant is fine, really. The merges are all quasi-abliterated models for people who want slutty AI girlfriends/boyfriends. The models directly from companies like GLM7 or Kimi or whatever are more standard and generally run more efficiently.

People in development are likely going to want a 70-100B model. Claude, I think, is a 100B model. You can run those on about 64gb of ram and 32gb of VRAM.

If you want settings for Koboldcpp I can give you the rundown on how to optimize.

[–] corgiwithalaptop@hexbear.net 3 points 3 hours ago* (last edited 3 hours ago)

huggingface client

This is what billionaires want us to take seriously. Let me just fire up the slimslam bazooper and have it connect to the sillynilly butterball API

[–] gay_king_prince_charles@hexbear.net 7 points 4 hours ago

You'd download ollama, and then get the model via ollama install I believe. Although the 90% as good mark is pushing it, as open weight models below 32b parameters (what you could reasonably run on those machines) benchmark around 40% less than Opus 4.6 for software, and the difference is night and day for general reasoning.

[–] BountifulEggnog@hexbear.net 2 points 3 hours ago

It depends massively on what hardware you have. I've heard good things about glm 4.7 flash and it's easy enough to run. Also depends on what you want to use it for.