Self Hosted - Self-hosting your services.

20457 readers

1 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules

No harassment
crossposts from c/Open Source & c/docker & related may be allowed, depending on context
Video Promoting is allowed if is within the topic.
No spamming.
Stay friendly.
Follow the lemmy.ml instance rules.
Tag your post. (Read under)

Important

Lemmy doesn't have tags yet, so mark it with [Question], [Help], [Project], [Other], [Promoting] or other you may think is appropriate. This is strongly encouraged!

Cross-posting

!everything_git@lemmy.ml is allowed!
!docker@lemmy.ml is allowed!
!portainer@lemmy.ml is allowed!
!fediverse@lemmy.ml is allowed if topic has to do with selfhosting.
!selfhosted@lemmy.ml is allowed!

If you see a rule-breaker please DM the mods!

founded 5 years ago

MODERATORS

Zoe8338@lemmy.ml

dogmuffins@lemmy.ml

testman@lemmy.ml

LLM with Web Search functionality (lemmy.ml)

submitted 1 month ago by SubArcticTundra@lemmy.ml to c/selfhost@lemmy.ml

33 comments fedilink hide all child comments

Conducting deep web searches and gathering sources is one of the main things I've been using LLMs for. How far away are we from being able to self-host something like Claude's web search capabilities? Or even just a service where I'd pay with my money instead of my data?

you are viewing a single comment's thread
view the rest of the comments

[–] vapeloki@lemmy.world 2 points 1 month ago

AMD Strix is an APU, optimized for AI. It is the cheapest option I am aware of to run bigger models at home. 2k for 56GB VRAM, and less den 300W total power Budget.

One could run smaller models. But for the context sizes required for research work, that is nearly impossible.

Also, external services, like openrouter, can be used to use models hosted in the cloud.

But for self hosted, you need something that can run models with at least 15GB of VRAM + Context. For comparison. Our highly quantized model uses 20GB of vram. For our 4 slots we need another 20GB on top of it (around 5GB for 254k tokens), making it 40GB.