askchapo

23261 readers

206 users here now

Ask Hexbear is the place to ask and answer ~~thought-provoking~~ questions.

Rules:

Posts must ask a question.
If the question asked is serious, answer seriously.
Questions where you want to learn more about socialism are allowed, but questions in bad faith are not.
Try !feedback@hexbear.net if you're having questions about regarding moderation, site policy, the site itself, development, volunteering or the mod team.

founded 5 years ago

MODERATORS

PorkrollPosadist@hexbear.net

replaceable@hexbear.net

VILenin@hexbear.net

SexUnderSocialism@hexbear.net

Wakmrow@hexbear.net

abc@hexbear.net

Can anyone recommend a workflow/tool(s) for syncing a plaintext diarized transcript to audio to obtain high-quality subtitles? (hexbear.net)

submitted 4 months ago by AernaLingus@hexbear.net to c/askchapo@hexbear.net

0 comments fedilink hide all child comments

The MLP wiki has high quality diarized transcripts that look like this:

Pinkie Pie: I'm awake! I'm awake! What time is it?! Did we sleep through the test?! [snores]
[beeping stops]
Rarity: No, but school starts in thirty minutes!
Sunset Shimmer: [sighs] How's everybody feeling about our test?
Fluttershy: Even after our all-night study session, I still don't know the difference between vaporization and sublimation.

Ideally, I'd like to have a tool that I can feed this to which will spit out some synced subs. The exact per-character diarization isn't actually important, since I'll certainly strip out the character names (and probably the [SDH things]) to avoid problems with alignment and they won't be in the final subtitles; rather, I want to make sure that the boundaries between speaker utterances are respected.

MFA seems like it could work, but I'm unsure of how best to preprocess the transcript/audio to get good results. I tried aligning with and without the built-in segment command as well as bumping the beam and beam-retry values with less-than-stellar results.

I'm also aware of some commercial services that offer this functionality (Descript and YouTube), but I'm looking for a solution I can run locally.

Any pointers would be greatly appreciated! stalin-heart

Sort of a separate question, but is there a tool that will allow for precise line-splitting when using word-timestamped transcripts (e.g. the JSON output of Whisper)? It seems like it should be fairly straightforward, but it doesn't seem like SubtitleEdit can do it and I had trouble finding a tool that can handle it. Would be a really nice feature, since splitting lines is probably the most tedious task when dealing with automatic transcripts.

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here