232

I just listened to this AI generated audiobook and if it didn't say it was AI, I'd have thought it was human-made. It has different voices, dramatization, sound effects... The last I'd heard about this tech was a post saying Stephen Fry's voice was stolen and replicated by AI. But since then, nothing, even though it's clearly advanced incredibly fast. You'd expect more buzz for something that went from detectable as AI to indistinguishable from humans so quickly. How is it that no one is talking about AI generated audiobooks and their rapid improvement? This seems like a huge deal to me.

top 50 comments
sorted by: hot top controversial new old
[-] simple@lemm.ee 118 points 1 year ago

A lot of people just aren't aware of how fast AI is moving. AI voices were pretty meh earlier this year. A lot of people working on the audiobook/voice acting scene have been talking about this though.

[-] driving_crooner@lemmy.eco.br 40 points 1 year ago

I recommend everyone to check the YouTube channel "two minute papers" who have being doing videos about papers on AI for the last 10 years on so to see the accelerated progress AI have. Like 5 years ago those images generating AI looked like LSD infused dreams and now they look almost perfect.

[-] Magrath@lemmy.ca 6 points 1 year ago

I wish I could watch his videos but the way he talks is awful. It's like some exaggerated evolution of YouTube talk.

load more comments (1 replies)
load more comments (7 replies)
[-] LadyLikesSpiders@lemmy.ml 97 points 1 year ago

Ah yes, Audio AI. I can't wait for this rapidly-approaching future where you literally won't be able to trust the validity of anything your senses tell you anymore

[-] mindbleach@sh.itjust.works 45 points 1 year ago

"Text was never trustworthy."

-- Abraham Lincoln

[-] LadyLikesSpiders@lemmy.ml 12 points 1 year ago
[-] bingbong@lemmy.dbzer0.com 14 points 1 year ago

Ahead of his time too:

Nobody lies on the internet

-Abraham Lincoln

[-] LadyLikesSpiders@lemmy.ml 6 points 1 year ago

Truly one of the wisest men to ever live

load more comments (2 replies)
[-] AdmiralShat@programming.dev 35 points 1 year ago

Imagine the day when people post videos of the president saying literally anything with pitch perfect audio voice synth

Imagine going to prison for a generated clip of you confessing to a crime.

[-] FaceDeer@kbin.social 25 points 1 year ago

Once the tech is that good, a recording of your confession will be useless as evidence in court.

[-] AdmiralShat@programming.dev 13 points 1 year ago* (last edited 1 year ago)

...but it is already that good? The fact that celebrities are having to come out and say it wasn't them in an ad is proof enough that it can fool people

You only need to fool a jury

[-] FaceDeer@kbin.social 9 points 1 year ago

Then we'll have to take more care with how jury trials are conducted. It's always been possible to fool juries, that's often a lawyer's entire strategy.

[-] Moneo@lemmy.world 5 points 1 year ago

That got me thinking about when we'll hear the first case of AI generated security camera footage used to frame someone. Which leads me to wonder when it will be standard procedure for cameras to digitally sign their footage.

load more comments (2 replies)
[-] xkforce@lemmy.world 5 points 1 year ago

Everything will be useless in court. Audio evidence? Worthless. Video evidence? Worthless. Physical evidence? Prove that it wasnt planted. That kind of AI is a fucking nightmare and no one really understands the danger that kind of AI poses.

[-] FaceDeer@kbin.social 9 points 1 year ago

AI can't tamper with physical evidence. It can't fake financial records or witness testimony. Many kinds of audio and visual recordings will still have sufficient authentication and chain of custody to be worthwhile.

The main kind of evidence that these AI generators makes untenable are the ones where someone just shows up and says "look at this video of X confessing to Y that I happen to have," which was never a particularly good sort of evidence to base a court case on to begin with.

[-] xkforce@lemmy.world 6 points 1 year ago* (last edited 1 year ago)

Witness testimony is already a very unreliable source of evidence. And again, evidence can be planted. Hell there was doubt about the chain of custody before AI could just make up audio and video. The validity of the chain of custody boils down to the cops and government in general being trusted enough to not falsify it when it suits them.

Sufficiently advanced AI can, and eventually will, be capable of creating deepfakes that cant reliably be proven to be false. Every test that can be done to authenticate that media can be used by the AI to select generated media that would pass scrutiny in principle.

I love the optimism and I hope you're right but I don't think you are. I think that deepfake AI should scare people a whole lot more than it does.

load more comments (2 replies)
[-] Shyfer@ttrpg.network 19 points 1 year ago

Or imagine politicians like Trump saying the most heinous stuff and then denying it saying it's fake or AI. How will people know? You won't even be able to trust your eyes or ears anymore.

load more comments (1 replies)
[-] ClaireDeLuna@lemmy.world 10 points 1 year ago

Soon the schizophrenics will become neuro-typical

[-] GarbageShoot@hexbear.net 9 points 1 year ago

Alright but hear me out: AI-generated odors

load more comments (1 replies)
load more comments (7 replies)
[-] Bobo@lemm.ee 33 points 1 year ago

I want TTS made better with AI so that I won't need huge audiobooks filling up my phone. The epubs that I already have would serve as audiobooks when needed.

[-] bionicjoey@lemmy.ca 9 points 1 year ago

If your phone is rendering TTS on the fly that's probably going to be a drain on battery.

load more comments (1 replies)
load more comments (2 replies)
[-] theskyisfalling@lemmy.dbzer0.com 30 points 1 year ago

As someone who only consumes books in audiobook form this is great news for me, I tried to listen to some automatically generated audio books around 2 years ago and I found them horrible to listen to just because they sounded so off.

I'd love to be able to copy in the text of a book and get actually listenable (is that a proper word?) audiobook out of the other side for some books that will just simply never be recorded by actual people due to being too old / obscure.

I've been wanting to be able to listen to the Pelucidar books for years but they just don't exist in audio format, is there somewhere publically available that I can do this?

[-] not_a_bot_i_swear@lemmy.world 17 points 1 year ago

I would guess there is a LOT of work going into each voice. Playing with different parameters and prompts. I don't think it's as simple as just copying the text into a box. Not yet at least :)

[-] Nukken@lemmy.world 7 points 1 year ago

That's a good thought there though. Audiobooks could have each character voiced uniquely.

[-] AdmiralShat@programming.dev 8 points 1 year ago

This is literally the only upside I see from this.

One of the Dune audio books started off as multiple voices and then part way through it was finished by just one guy. Really impressed with it at first, and then really kind of debuffed by it. I had already read the book years before so it wasn't a big deal, but like wtf?

[-] physcx@kbin.social 5 points 1 year ago

Lol what a troll audio book.

load more comments (2 replies)

Just curious, but how come you only consume books in audio format? (Please forgive me if this was rude to ask.)

[-] Nukken@lemmy.world 11 points 1 year ago

I can't speak for OP but I do this as well. For me it's because I listen to them on the drive to/from dropping my kids off at school and I'll have it playing while I'm working or playing a game.

[-] LadyLikesSpiders@lemmy.ml 5 points 1 year ago

As someone who would like to do this, how well do you actually pay attention to what is going on? I'd do so much more reading if I didn't have to go back and reread paragraphs several times over because I simply can't pay attention, let alone if I'm doing something else entirely

[-] milkisklim@lemm.ee 4 points 1 year ago

If you're interested further, check if your local library has a partnership with Libby. It's an app that you can check out audiobooks from.

load more comments (1 replies)
[-] AdmiralShat@programming.dev 4 points 1 year ago

It depends. It definitely is easy to get distracted and need to rewind but I found that happens much less often than with sitting down and reading in text form.

Its a solid solution and I recommend you give it a try.

AudiobookBay and youtube have tons of books

load more comments (2 replies)
load more comments (5 replies)
load more comments (1 replies)
[-] Bldck@beehaw.org 7 points 1 year ago

Not OP, but I almost exclusively read novels and non fiction via audiobooks. For context, I’m on pace for 70 books this year.

My main reason for audiobooks is I having a driving commute. Two hours a day round trip. Audiobooks keep me sane in a way that podcasts or music do not. I also do audiobooks when doing chores around the house.

Second, I struggle to focus on reading a book on my phone. Too many distractions and I think the reading experience is subpar. I do have an eInk reader, but I haven’t charged it in years because it’s easier to do audiobooks.

Physical books are rare in my home, but that’s a self-reinforcing cycle since I enjoy audiobooks so much.

[-] saigot@lemmy.ca 5 points 1 year ago* (last edited 1 year ago)

I like to read books before bed, but need darkness for a while before I have any chance of going to sleep, so me and my wife listen to 45min of audio book a night before going to sleep. Plus when we listen together there is no need to worry about getting ahead of each other and spoiling stuff.

I read books in other scenarios but that ritual is by the most time I have for reading and the most consistent as well.

load more comments (1 replies)
[-] Catoblepas@lemmy.blahaj.zone 4 points 1 year ago

Personally I mostly use audio books instead of reading because I get eye strain a lot easier than I used to. I go to an eye specialist for unrelated issues yearly, so it’s not an issue with a wrong lens prescription. It’s not a problem when I’m doing a low attention task where I can look away frequently, but for reading it sucks.

load more comments (1 replies)
load more comments (3 replies)
[-] crank@beehaw.org 5 points 1 year ago

Well you can always pay someone to read it for you. Blind people do that.

Are any of these books public domain? If so the print version could be eligible for inclusion at Project Guttenberg. PG has very specific docs about eligibility for this. You could probably get a scan from archive.org if you don't have one. You would have to clean up the OCR by hand.

Then it would eligible to be requested from the volunteer (human) readers who have been pumping out Libra audio books for years at LibriVox.

Recently I saw Gutenberg has a collab. They are producing and distributing Libre guidebooks generated by AI. I believe I read on one of the pages they have 4000 done. I haven't tried it out but I guess I should.

Project Gutenberg, Microsoft, and MIT have worked together to create thousands of free and open audiobooks using new neural text-to-speech technology and Project Gutenberg's large open-access collection of e-books. This project aims to make literature more accessible to (audio)book-lovers everywhere and democratize access to high quality audiobooks. Whether you are learning to read, looking for inclusive reading technology, or about to head out on a long drive, we hope you enjoy this audiobook collection.

I assume this is also a great benefit as fertilizer down at the old AI content farm which is otherwise totally run over with reddit shitposts.

If anyone tries it let me know how it goes.

The books I specifically mentioned are now public domain as they are old enough and librevox is where I actually started my audiobook (and books in general) journey. One of them is on there but it is only the second book of what is a 5 or more book series which is kinda frustrating.

The volunteer readers are very hit and miss however and I find that more than half are just not listenable for me due to different reasons from poor actual recordings, poor reading ability by the reader with excessive pauses added "errs and ummms" to mispronunciation of words constantly. These are pedantic reasons maybe and I throw no shade over it to the people that have volunteered their time to read these books but I just can't listen to them personally for the same reason I could never get through any amount of time with a robotic text to speech program of the past.

I'll look into the project Gutenberg thing however, thanks for making me aware of it and see what is up with that :)

load more comments (2 replies)
load more comments (7 replies)
[-] milicent_bystandr@lemm.ee 19 points 1 year ago

That sounds pretty cool, though I'd be concerned it will suffer from the classic problem of current AI (...and humans, but that's by the by) of confident incorrectness. Like an automatic transmission can miss meanings and types of context that a human will spot, programmatically generating speech can probably mess up punctuation and flow - even the way a human reader sometimes will get part way through a sentence and realise they need to start again for it to come out right.

That said, I can't see it being a big problem for most works, just unfortunate here and there. For once it seems an AI application short on downsides! (Except for the usual economic ones for many people previously trained in the field.)

[-] rustyredox@lemmy.world 12 points 1 year ago

There was a fairly big 40K lore channel on YouTube with a rather good AI impersonation of David Attenborough's voice and narration style/scripting. However, I just went to check it, yet it must have recently gotten hit with a DMCA and taken down. A shame really. Though I never got into 40K lore before, or the 40K franchise in general, I am a big fan of David Attenborough, and so that ended up really drawing me in to a new literary universe. However, it was a big mistake by the YouTube creator to use the name and photo likeness of Attenborough in the branding, video titles, and thumbnail art on the channel. I think without pushing that line, the AI voice with a clear disclosure could have kept the channel under the legal radar.

From the pinned comments made here, this looks to be the same creators new channel, now using a different voice, no longer based on any one real person:

load more comments (3 replies)
[-] maxprime@lemmy.ml 10 points 1 year ago

I’ve been getting into audiobooks in a big way recently. This is interesting but somehow seems off to me. Maybe I’ll try listening to one and have my mind changed. We’ll see!

[-] GammaGames@beehaw.org 8 points 1 year ago

Because it has the potential to become actively harmful to the audiobook industry

[-] Akrenion@programming.dev 7 points 1 year ago

And great for accesibility for people who can not read well.

[-] PlasterAnalyst@kbin.social 4 points 1 year ago

A lot of audiobook voices are harmful to the industry. Plenty of times I've listened to a book for ten minutes and said nevermind because the voice actor was terrible, making wet mouth sounds or their voice was just annoying or the audio quality was terrible.

load more comments (1 replies)
load more comments (3 replies)
[-] chicken@lemmy.dbzer0.com 7 points 1 year ago

Audiobooks are offputting to me and I strongly prefer to read text, but this seems like a great thing overall for making books more accessible to people. More people experiencing a wider range of books is good.

[-] Zikeji@programming.dev 5 points 1 year ago

Audiobooks have been a great coping mechanism for my ADHD, they've also made me a better driver.

For the latter, if I listen to my music I definitely feel a bit more aggressive, whereas if it's an audiobook (and I've given myself sufficient room), I'm much more forgiving.

For the former, I can mix them with menial tasks and it makes them so much more doable.

[-] PipedLinkBot@feddit.rocks 5 points 1 year ago

Here is an alternative Piped link(s):

this AI generated audiobook

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I'm open-source; check me out at GitHub.

[-] bonn2@lemm.ee 4 points 1 year ago

There are also a few AI sung songs out there that are pretty good. Most of them sound pretty Autotuny, but to some extent, that can be a style. Aura, by Ghost, is a good example. If I didn't know it was ai, I would just think it was autotune.

load more comments (2 replies)
load more comments
view more: next ›
this post was submitted on 11 Nov 2023
232 points (94.6% liked)

Asklemmy

43940 readers
534 users here now

A loosely moderated place to ask open-ended questions

Search asklemmy 🔍

If your post meets the following criteria, it's welcome here!

  1. Open-ended question
  2. Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
  3. Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
  4. Not ad nauseam inducing: please make sure it is a question that would be new to most members
  5. An actual topic of discussion

Looking for support?

Looking for a community?

~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~

founded 5 years ago
MODERATORS