this post was submitted on 11 Feb 2025
530 points (98.7% liked)
Technology
63082 readers
3600 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
They are, however, able to inaccurately summarize it in GLaDOS's voice, which is a strong point in their favor.
Surely you'd need TTS for that one, too? Which one do you use, is it open weights?
Zonos just came out, seems sick:
https://huggingface.co/Zyphra
There are also some “native” tts LLMs like GLM 9B, which “capture” more information in the output than pure text input.
A website with zero information, and barely anything on their huggingface page. What’s exciting about this?
Ahh, you should link to the model
https://www.zyphra.com/post/beta-release-of-zonos-v0-1
Whoops, yeah, should have linked the blog.
I didn't want to link the individual models because I'm not sure hybrid or pure transformers is better?
Looks pretty interesting, thanks for sharing it
Yeah, out of all the generative AI fields, voice generation at this point is like 95% there in its capability of producing convincing speech even with consumer level tech like ElevenLabs. That last 5% might not even be solvable currently, as it's those moments it gets the feeling, intonation or pronunciation wrong when the only context you give it is a text input, which is why everything purely automated tends to fall apart quite fast.
Especially voice cloning - the DRG Cortana Mission Control mod is one of the examples I like to use.