Folks, are there GenAI tools that can be used like ChapGPT prompt? Basically, chapgpt seems to now require login and i don't want the thing to profile me. I'd rather train a foss tool.

top 7 comments

sorted by: hot top controversial new old

[–] merde@sh.itjust.works 2 points 3 weeks ago

☞ https://aihorde.net/

[–] j4k3@lemmy.world 2 points 3 weeks ago

llama.cpp is at the core of almost all offline, open weights models. The server it creates is Open AI API compatible. Oobabooga Textgen WebUI is more user GUI oriented but based on llama.cpp. Oobabooga has the setup for loading models with a split workload between the CPU and GPU which makes larger gguf quantized models possible to run. Llama.cpp, has this feature, Oobabooga implements it. The model loading settings and softmax sampling settings take some trial and error to dial in well. It helps if you have a way of monitoring GPU memory usage in real time. Like I use a script that appends my terminal window title bar with GPU memory usage until inference time.

Ollama is another common project people use for offline open weights models, and it also runs on top of llama.cpp. It is a lot easier to get started in some instances and several projects use Ollama as a baseline for "Hello World!" type stuff. It has pretty good model loading and softmax settings without any fuss, but it does this at the expense of only running on GPU or CPU but never both in a split workload. This may seem great at first, but if you never experience running much larger quantized models in the 30B-140B range, you are unlikely to have success or a positive experience overall. The much smaller models in the 4B-14B range are all that are likely to run fast enough on your hardware AND completely load in your GPU memory if you only have 8GB-24GB. Most of the newer models are actually Mixture of Experts architectures. This means it is like loading ~7 models initially, but then only inferencing two of them at any one time. All you need is the system memory or the Deepspeed package (uses disk drive for excess space required) to load these larger models. Larger quantized models are much much smarter and more capable. You also need llama.cpp if you want to use function calling for agentic behaviors. Look into the agentic API and pull history in this area of llama.cpp before selecting what models to test in depth.

Huggingface is the goto website for sharing and sourcing models. That is heavily integrated with GitHub, so it is probably as toxic long term, but I do not know of a real FOSS alternative for that one. Hosting models is massive I/O for a server.

[–] HoleSailor@feddit.org 1 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

For Android, there are plenty of local AI clients such as ChatterUI, PocketPal etc. Just download a suitable gguf model from huggingface and use it. If you have 8GB+ RAM, you can easily run 3B models.

Edit: Try to find iMatrix quantized gguf models. They preserve better quality in in smaller size and runs bit faster.

[–] trilobite@lemmy.ml 1 points 2 weeks ago (1 children)

None of these are on fdroid. Only Ollama chatta is there but was updated 16 months ago. Hugginface looks interesting. Are the model you download from there safe?

[–] HoleSailor@feddit.org 1 points 2 weeks ago* (last edited 2 weeks ago)

I use ChatterUI (available on Github) because I found it faster and smoother than the rest of the clients available for Android.

Yes, the gguf models on Huggingface are totally safe. They will run locally in your phone. You don't need Internet connection to run that model. So, it is safe. No data will ever leave your device.

Edit: There is only one client available on f-droid: Maid But it never worked in my device, unfortunately.

[–] FrankLaskey@lemmy.ml 1 points 3 weeks ago

I would check out Open WebUI which can be self-hosted via docker etc and configured with any OpenAI compatible endpoint so you can use a service like OpenRouter to run nearly any LLM remotely. Most of the open weights ones like Qwen 3 or Kimi K2 Thinking are great and cost pennies per inquiry and can be configured with Zero Data Retention (ZDR) so your data is not recorded. You could also use something like Ollama to run local LLMs if you want even more privacy and have the hardware (typically a modern Nvidia GPU with at least 16-24 GB of VRAM).

[–] Auster@thebrainbin.org 1 points 3 weeks ago

Dunno how to train a database model or however it's called, but using Termux + Ollama + qwen2.5:3b, it helps as much as I'd expect for a (to my knowledge) non commercial tool. Also since you can pull other models/whatever, maybe you can provide your own too.