Yes. My Actual Intelligence lives in my head, and runs mostly on coffee.
Selfhosted
A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.
Rules:
-
Be civil.
-
No spam.
-
Posts are to be related to self-hosting.
-
Don't duplicate the full text of your blog or readme if you're providing a link.
-
Submission headline should match the article title.
-
No trolling.
-
Promotion posts require active participation, with an account that is at least 30 days old. F/LOSS without a paywall has exceptions, with requirements. See the rules link for details.
Resources:
- selfh.st Newsletter and index of selfhosted software and apps
- awesome-selfhosted software
- awesome-sysadmin resources
- Self-Hosted Podcast from Jupiter Broadcasting
Any issues on the community? Report it using the report flag.
Questions? DM the mods!
Just coffee?!? That's cool.
Mine runs on:
- coffee
- spite
- tortilla chips
- & shame
If that's not already on a shirt it should be
Mostly on coffee, not exclusively. Noticable amounts of spite & tortilla chips are also present, yes, but... no shame.
critical security bug: if coffee is taken away my head hurts :(
As we know AI stands for "An Indian", so if you're not from India, its actually impossible to self host.
Well, unless you manage to trap one in your basement, but that would violate human rights and hopefully also break the laws of your country.
You may be confusing Indians with gremlins. Which might explain ChatGPTs obsession with gremlins

An aside for anyone reading this:
https://sleepingrobots.com/dreams/stop-using-ollama/
And that barely scratches the surface. Please.
Use anything but Ollama. Even APIs.
Thanks. Good to know
Llama.cpp or death!
It's not that hard to use llama.cpp directly anyway. Why would I use a wrapper when I can just run a python script?
Thanks for this link. Because of this article, I had claude stand up a llama.cpp container next to my already running ollama container. It ran side by side tests with the same model and parameters, and the results blew ollama out of the water. I'm in the process of moving hermes and openwebgui over to the llama.cpp instance to see how it goes day to day.
I agree that the concerns listed there are smells, and I wasn't aware of some of the options listed there.
Thank you for sharing this!
Didn't know this. Going to switch this weekend, thanks for sharing this!
Yes. Openwebui/ollama for LLM, comfyui for stable diffusion. I just dick around with it as a toy.
Same. Its somewhat useful on some very small scripting or tasks...but its mostly just to try out a new model or two. Its not really useful for anything big.
I will have to say....even my tiny models are about as good as Chatgpt/Claude/etc... which makes me think about how much people are spending on tokens regularly. I was able to get the same kind of python script started with my local tiny model that was comparable to the newest Claude code offerings.
What local models have you been using? And what hardware are you running them on? I've been playing with local LLMs a bit for exactly your use case.
I have zero interest in vibe coding or full agentic workflows. But having a local LLM generate a Bash script to help me automate parts of my home lab infrastructure would be nice.
Yeah, I'm using qwen 31b a3b on an amd 9070xt requires a bit of cpu offloading, but still plenty fast. Using it wall llama.cpp. Combine that with some mcp's such as ddg-search to make it truly useful by actually being able to search online.
I mostly use it for small tedious tasks with well defined inputs and outputs. For example when hyprland recently changed from their own configuration language to lua. At first I started going line by line translating my config to the new lua language until I realized oh wait this is exactly the type of thing that ML is useful for. Going from the well defined hyprland configuration language to their also well defined lua syntax. It banged it out in less than a minute with only a single mistake which I easily fixed. The mistake it made was that it forgot to translate the comments to lua. It did it in less than a minute and worked first try. Where as I had made several typos and gotten a few lines wrong when I was doing it by hand.
Not to say that I couldn't do it. I would have gotten it done in about half an hour, but less than a minute is a lot faster.
I also used it to transform a bunch of unstructured data into json data, so that I could then use purpose built tools like jq to parse that. If I'm having trouble finding certain information. I'll ask it to find me some resources to look at.
Basically small well defined tasks and parsing data is what I use it for and it seems to be pretty good at that.
What I don't like is the way companies try to market it to people. I don't believe people should be trying to summarize emails or messages from loved ones, writing essays or any other creative tasks for the most part. Translating is okay. I don't expect a machine to be able to decide things for me or to be some filter between me and others.
I do, but I am becoming increasingly more disappointed as time goes on. Not just self hosted, llms in general. They sometimes help, but they mislead so many times and waste time that you don't even notice. I think that's the trap. When you succeed at a task, you become impressed but don't notice how many times it failed doing a simple task. And as soon as you scratch the surface, you see how you would have done it differently and perhaps in a better way. Even just googling is bad. It does research for you, but it has no critical thinking and can't decide what is better from the results it gets (other than google ranking) so it often leads you to think it did as good as you would, when it's nowhere near as good. Every time I did the googling myself after it did, I did it much better. And I mean MUCH better. Ask it to find the app, it misses the most important ones, hallucinates a bunch, for ex. I found this to be the case with frontier models as well.
Self hosting has its benefits, but seeing how the ecosystem looks right now, concluding this is a huge bubble is inevitable. It reminds me of crypto so much. It looks rich and plentiful, but as soon as you dig a mm under the surface - nobody has tested it, it's got a critical bug, it is overblown and there are issues with no response. No docs, no info, no nothing. For the biggest thing in technology in history, it is awfully hollow. I don't mean it in a condescending way, in fact community is enthusiastic and very helpful, it's just that it doesn't live up to what most would expect.
A caveat I need to mention is I have not used it for coding - I have an irrational fear and resistance towards it, being a programmer. I just won't touch it, even if it means the end of my career. I'm trying to be grown-up about it, but so far, I dont want to use it, for good and bad reasons.
Nope.
I've tried a few times but with only 8gig of vram it's simply not worth it.
Yes, I got a Strix Halo machine before the RAM price hike and use it to run all my ML stuff on it.
Currently using llama-swap with llama.cpp/ComfyUI and opencode/Open WebUI as frontend.
I'm running Qwen3.6-27b, Voxtral Mini 4b, Piper and Qwen Image. Also, some embedding and reranking models.
I use them for:
- Tagging and classification of my documents in Paperless
- Home Assistant (voice assistant)
- Translations (both text and image)
- Transcriptions
- Some light coding and debugging
- Avatar/Backdrop generation for DnD sessions
The other day I made a machine learning model that classifies images as either 'a certain type of undesirable image' (no, not porn) or 'any other image'. It is 96.4% accurate and takes 14 ms to classify one image (using CPU only - with a GPU it could be 5x - 10x faster).
I plan to offer this as an API service that social media networks can use to filter posts.
I tried but I only have 16g of ram and it wouldn't complete a thought alas
If I wanted AI for some reason, it'd be self-host or nothing.
Running qwen3.6 27b through llama.cpp.
It's about as capable as sonnet 3.5.
I use it for light scripting, but real coding is done by cloud models.
I'm also using it as the brain for my Hermes agent. It sends me digests of news, subreddits, chats that I'd like to read but don't have time for. It does a great job researching things on the web for me, too.
I do, I use ollama. I mostly just tinker, but I use with with home assistant for a quasi Alexa like experience with the voice assistant, I use it for summarizing some YouTube transcripts in too lazy to read/watch, and I've tried to see how capable it is with coding.
I've thought about it, but I actually could never think of anything I would do with it.
No. I still have no use for it and everything I use is automated without at a far lower footprint.
Running decencored Qwen3.6-27b on Ollama with a mostly vibe coded discord bot with a 9b Gemma model for smaller stuff like RAG. Just got it to run tools and scrape and post news on a schedule. The first model I can run locally that's smart enough to be useful. May give Jan a try for the back end after reading that other guys rant.
Mostly use it for stupid questions I could have googled and to brag to friends.
No, I have taste.
Why would I?
I currently run Qwen3.6-27b on llama.cpp and use it via openwebui. Mostly, I use it for web research via tavily, to a lesser extent for coding and interactively learning about things that are new to me but common in training data (such as basic math or ML concepts).
Jup. Ollama and OpenWebUI is a great stack to tinker with some LLM models. They're kinda useful for aggregating large datasets, translations, frontend development and gathering relevant sources for me to read into. Also, Qwen has been amazing in understanding frameworks without documentation and writing one for me. I had to use some self-developed PHP framework for a task once and without qwen, I would've taken probably two more weeks to get the task done.
MiniCPM has also been REALLY good at image detection, describing it as accurately as possible, feeding it into qwen who then searches what the object could be and returning the result. I always liked google lense and that stack gave me a TEMU-Version of google lense that isn't quite as reliable, but definitely very useful.
I don’t host it exactly, just use it when I don’t use my graphics card for gaming. I run Qwen3.6-35b on my 16gb vram RX 9700 xt with 34t/s. I use it as an IT advisor, admin and Linux teacher for my cachyOS gaming PC.