this post was submitted on 04 Jun 2026
50 points (100.0% liked)

Selfhosted

60091 readers
677 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam.

  3. Posts here are to be centered around self-hosting. Please ensure it is clear in your post how it relates to self-hosting.

  4. Don't duplicate the full text of your blog or git here. Just post the link for folks to click.

  5. Submission headline should match the article title.

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 3 years ago
MODERATORS
 

I'm starting to develop arthritis in my fingers, which makes typing an interesting challenge.

I'm wondering, what is the best self-hosted solution for speech-to-text generally? Dragon dictate use to be the thing, but is there anything open source, self-hostable that's superseded it?

I would love to be able to have something that I can speak into that can interact with pretty much any app, be that notepad++, or my web browser when I'm entering stuff or even when I'm creating this Lemmy post (which I actually made using futo voice on my phone).

Windows and/or Linux ideally.

Any leads? Getting old sucks.

you are viewing a single comment's thread
view the rest of the comments
[–] JustEnoughDucks@slrpnk.net 3 points 2 weeks ago* (last edited 2 weeks ago)

Hey there!

This is a job for Handy. I recently started using it with the Parakeet recommended model. It works very very well. Whisper packaged with Wyoming that I run is pretty much unusable for dutch but Parakeet in Handy seems to work quite well. Direct input of text into whatever program you want. Downloads a small local model and works only on your computer. Push-to-talk or toggle, tons of customizations.

Note that this just speech to text and the models aren't made for post processing. So it will dictate exactly what you say, umms and everything. If you want cleanup, formatting, etc... Then you have to rely on a much more general model, but the makers are experimenting with an opt-in for that for various external AIs.