LocalLLaMA

4724 readers

42 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

pax@sh.itjust.works

noneabove1182@sh.itjust.works

Smokeydope@lemmy.world

MonsterBug@sh.itjust.works

"The cost of running LLMs is just too damn high" (aussie.zone)

submitted 12 hours ago* (last edited 12 hours ago) by SuspiciousCarrot78@aussie.zone to c/localllama@sh.itjust.works

9 comments fedilink hide all child comments

I was browsing Reddit (yetch) while waiting for some stuff to finish when I came across this post

https://old.reddit.com/r/LocalLLM/comments/1tek00h/why_is_llm_is_so_expensive/

The author make a (very) interesting claim: if table stakes are $6K (they're not...but go with it for now), then most folks are cooked from the get go.

Personally, I have been figuring out how to get more from less. For example, people have found ways to run Qwen3.6 35B on a 6GB VRAM GTX 1060 at ~20tok/s (--ctx 64K IIRC, but go check the vids yourself)

https://youtu.be/8F_5pdcD3HY

I think there's a lot of juice to squeeze by turning LLMs from "all seeing sages" into basically mouth pieces for shit that actually runs fast on regular silicon - but that's just me and my crazy brain. YMMV.

you are viewing a single comment's thread
view the rest of the comments

[–] SuspiciousCarrot78@aussie.zone 4 points 11 hours ago* (last edited 11 hours ago) (1 children)

I'm actually thinking of pivoting my router/orchestrater entirely. I think the way forward is to look at expert systems (yes, those ancient things from the long, long ago of...1980) but with modern tooling (that can be user updated), with a small LLM in the middle that the user can talk to. That is, de-emphasize the central role of the LLM entirely; rather, make it the user-facing NLP input/output and let the real programs, running on real silicon, do the work. I might have a different use case than most, but I bet not so different (that is to say, online LLM discussion seem to gravitate around user that use LLMs for coding; Anthropic and OAI internal reports say otherwise)

Ironically, I'm writing the blurb now while waiting for smoke test #90238472398 to finish.

[–] randomaside@lemmy.dbzer0.com 5 points 11 hours ago (1 children)

I’ve been saying this for a while to people. I think the long term use case for LLMs is the semantic human interface device.

Siri,Alexa, even google home?(whatever they called it), they all swung and missed at this. However evening able to provide commands unclearly to a computer and get the intended result would be a huge win.

I know the big llm inference can do a lot more but the cost is high for systems with that ability to reason however small, lightweight llms are actually very good for command and control.

This where my current homeland projects are focused.

[–] SuspiciousCarrot78@aussie.zone 5 points 10 hours ago

Hey, me too :) As my school teachers use to tell me "Great minds think alike (but fools seldom differ :)"

For me, I'm thinking of having a LLM as one layer / one container in a homelab that does some specific stuff

queries against local docs / notes / manuals / PDFs / wiki material as the trusted knowledge layer
uses tools for search, file lookup, shell, git, Docker, Home Assistant, calendar, etc.
a local “Codex” / wiki layer that turns my own source material into an inspectable knowledge base
provenance and audit trails

I want to take a screenshot of something, drop it into Syncthing from my phone, then later ask "did I fuck the pins on this?" ... and for it to look up the schematics, eyeball the pins and tell me. Or I say "hey, can you grab a copy of X for me, usual params" and have the LLM instruct Sonarr/Radarr/Sabnzdb to do that. (That is, make your OWN "Alexa" with an Arduino ESP32, stick it in a room and then call it when you need it).

So instead of asking a 70B model to “know” why your media server is down, the system checks service status, logs, last config changes, prior notes, Docker state, network state, etc., then the LLM explains the result in human language. You can probably do that with a 4B (I'm testing that assumption now).

Same for “find that motherboard note,” “summarize this email thread,” “turn this into a task,” “compare this Ebay listing to my saved hardware notes,” “what did I do last time this broke,” or “run the smoke test and tell me the first real failure.”

I think small models are the shit for this because if the model only has to classify intent, route the request, render structured evidence, and talk like a normal human...then it doesn’t need to be a giant oracle. The expensive (time wise) part becomes less “make the model smarter” and more “build a better control plane around it.”

Basically: local LLM as semantic HID; expert system/tool router underneath; user owns the data and the machine.

As always, ICBW....but fuck it, I'm gonna try.

PS: I have an idea of how to apply that to coding too...but that's a project for much later. I've been cooking this shit for far too long. The next thing I wanna do is a fun project for myself (that is: ROM hack a parachute and grappling gun into Super Mario Sunshine, so I can basically play "What if Super Mario Sunshine but actually Just Cause 2" on my Wii with the kids.