overview for trave

what's the best model these days I could fit in 128gb ram? in c/localllama@sh.itjust.works

[–] trave@lemmy.sdf.org 1 points 1 week ago (2 children)

update: I tried GLM4.5 air and it was awesome until I remembered how censored it is by the Chinese government. Which I guess is fine if I'm just coding but just on principal I didn't like running a model that will refuse to talk about things China doesn't like. I tried Dolphin-Mistral-24B which will answer anything but isn't particularly smart.

So I'm trying out gpt-oss-120b which was running at an amazing 5.21t/s but the reasoning output was broken and it seems the way to fix it was to switch from the llamacpp python wrapper to pure llamacpp

...which I did, and it fixed the reasoning output... but now I only get .61t/s :|

anyway, I'm on my journey :) thanks y'all

what's the best model these days I could fit in 128gb ram? in c/localllama@sh.itjust.works

[–] trave@lemmy.sdf.org 2 points 1 week ago (2 children)

oh I didn't realize I could use llamacpp with openwebui. I recall reading something about how ollama was somehow becoming less FOSS so I'm inclined to use llamacpp. Plus I want to be able to more easily use sharded ggufs. You have a guide for setting up llamacpp with openwebui?

I somehow hadn't heard of GLM 4.5 Air, I'll take a look thanks!

what's the best model these days I could fit in 128gb ram? in c/localllama@sh.itjust.works

[–] trave@lemmy.sdf.org 1 points 1 week ago (2 children)

some coding yeah but also want one that's just good 'general purpose' chat.

Not sure how much context... from what I've heard models kinda break down at super large context anyway? Though I'd love to have as large of a functional context as possible. I guess it's somewhat a tradeoff in ram usage as the context all gets loaded into memory?

what's the best model these days I could fit in 128gb ram? in c/localllama@sh.itjust.works

[–] trave@lemmy.sdf.org 2 points 1 week ago* (last edited 1 week ago) (4 children)

oh and I'm running Ollama/OpenWebUI on Debian

20

what's the best model these days I could fit in 128gb ram? (lemmy.sdf.org)

submitted 1 week ago by trave@lemmy.sdf.org to c/localllama@sh.itjust.works

12 comments fedilink

Yes this is a recipe for extremely slow inference: I'm running a 2013 Mac Pro with 128gb of ram. I'm not optimizing for speed, I'm optimizing for aesthetics and intelligence :)

Anyway, what model would you recommend? I'm looking for something general-purpose but with solid programming skills. Ideally obliterated as well, I'm running this locally I might as well have all the freedoms. Thanks for the tips!