this post was submitted on 14 Dec 2025
10 points (100.0% liked)

askchapo

23200 readers
224 users here now

Ask Hexbear is the place to ask and answer ~~thought-provoking~~ questions.

Rules:

  1. Posts must ask a question.

  2. If the question asked is serious, answer seriously.

  3. Questions where you want to learn more about socialism are allowed, but questions in bad faith are not.

  4. Try !feedback@hexbear.net if you're having questions about regarding moderation, site policy, the site itself, development, volunteering or the mod team.

founded 5 years ago
MODERATORS
 

The MLP wiki has high quality diarized transcripts that look like this:

Pinkie Pie: I'm awake! I'm awake! What time is it?! Did we sleep through the test?! [snores]
[beeping stops]
Rarity: No, but school starts in thirty minutes!
Sunset Shimmer: [sighs] How's everybody feeling about our test?
Fluttershy: Even after our all-night study session, I still don't know the difference between vaporization and sublimation.

Ideally, I'd like to have a tool that I can feed this to which will spit out some synced subs. The exact per-character diarization isn't actually important, since I'll certainly strip out the character names (and probably the [SDH things]) to avoid problems with alignment and they won't be in the final subtitles; rather, I want to make sure that the boundaries between speaker utterances are respected.

MFA seems like it could work, but I'm unsure of how best to preprocess the transcript/audio to get good results. I tried aligning with and without the built-in segment command as well as bumping the beam and beam-retry values with less-than-stellar results.

I'm also aware of some commercial services that offer this functionality (Descript and YouTube), but I'm looking for a solution I can run locally.

Any pointers would be greatly appreciated! stalin-heart


Sort of a separate question, but is there a tool that will allow for precise line-splitting when using word-timestamped transcripts (e.g. the JSON output of Whisper)? It seems like it should be fairly straightforward, but it doesn't seem like SubtitleEdit can do it and I had trouble finding a tool that can handle it. Would be a really nice feature, since splitting lines is probably the most tedious task when dealing with automatic transcripts.

no comments (yet)
sorted by: hot top controversial new old
there doesn't seem to be anything here