this post was submitted on 12 Jun 2026
1 points (100.0% liked)

Home Assistant

275 readers
2 users here now

Home Assistant is open source home automation that puts local control and privacy first. Powered by a worldwide community of tinkerers and DIY...

founded 3 years ago
MODERATORS
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/homeassistant by /u/Leather_Idea_2122 on 2026-06-12 10:56:24+00:00.


Hey all, HA Voice PE user here running local STT on a N100.

I've been frustrated with the long delay between finishing a sentence and hearing the assistant respond and tracked it down to the way wyoming-faster-whisper works: it buffers your entire spoken utterance into a WAV file, then starts inference only after you stop talking.

I added streaming ASR support using sherpa-onnx's OnlineRecognizer. The model now decodes audio chunks as they arrive, so for me on my N100 by the time I stop speaking most of the inference is already done.

In day-to-day use it makes a real difference and the assistant feels much more responsive. In fact, HA assist debug typically reports 0s-0.5s STT time only. In past it took twice the time of the recorded audio after I stopped speaking (3s spoken command -> 6s processing after I stopped speaking before it even went into LLM/local pocessing).

To try it:

Pull the Docker image:

docker pull ghcr.io/pkrahmer/wyoming-faster-whisper:latest

Run it with the streaming English model:

--stt-library sherpa --model sherpa-onnx-streaming-zipformer-en-2023-06-26 --language en

I use this German model at home:

--stt-library sherpa --model sherpa-onnx-streaming-zipformer-de-kroko-2025-08-06 --language de

Fork and details: https://github.com/pkrahmer/wyoming-faster-whisper

Happy to answer questions. Would love to know if others notice the same improvement.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here