Writing / Datatalk / TTS.
There are many TTS engines, web interfaces, and programs out there. But... some are quite expensive, others have poor voices, and you can't always be sure of what's 'in the tin', because they won't give you a free trial first.
I don't have the money nor the time to do an extensive review (hey, someone willing to sponsor me? 😁) but I needed to do *some* investigation for one of my (writing-related) projects.
Read on for the results, and the samples I recorded.
First conclusions
So I Googled and browsed and downloaded, and came to the following conclusion:
- Some services are simply repackaged Azure, Amazon, or Google services, where they charge you much more than the original providers
- Eleven Labs is the best, and very expensive (I suspect they do some 'quote detection', and they claim to look for larger-volume context)
- Amazon Polly is (thus far) the most affordable option for non-free, neural voices.
- You can access Azure for free (sort of) using the HTML to Immersive Reader path (for as long as Microsoft will let you).
- Google Wavenet seems to be the worst of the big three (Google, Amazon, Azure)
- For short TTS checks, (for example an author checking out his or her own work using TTS) the regular ree and built-in SAPI5 voices will do
- TTS would work a lot better if it would be able to detect dialogue, either using a neural network or simply detecting single and double quotes (Lovo, Eleven)
- Even if Speechify would be great, I'd hesitate to recommend it as they are a tad too pushy with their website and presentation
1. I used the following text to test the different engines:
It's painted bright white and stands out like a beacon between all the, uh, well seasoned buildings we've seen so far. 'Well,' she says, 'that is quite the surprise. Now tell me, how do we continue from here?'
2. When using web-based services I tried to download their generated samples. If their website wouldn't let me I used Total Recorder to grab the audio stream.
3. After downloading / recording all voice samples I normalized all voices to 95% volume, and converted any .WAV to .MP3.
Results and samples
Windows SAPI5 - tts_balabolka_win11_sapi5_zira.mp3
Default Windows voice, good enough for simple chapter checks
Amazon Polly Ruth - tts_cli_amazon_neural_ruth.mp3
Cheapest acceptable neural voice
Amazon Polly Joanna Neural - tts_cli_amazon_neural_joanna.mp3
Another neural voice
Amazon Polly Joanna - tts_cli_amazon_joanna.mp3
Same voice, non-neural
Google 1 - tts_balabolka_google1.mp3
Retrieved via Balabolka, older voice and meh
Google 2 - tts_balabolka_google2.mp3
Retrieved via Balabolka, older voice and meh
Google Standard - tts_balabolka_googlecloud_standard.mp3
Standard non-neural Google Cloud voice
Google Neural E - tts_balabolka_googlecloud_wavenet_e.mp3
Google Cloud Wavenet, neural voice US / female / E
Google Neural F - tts_balabolka_googlecloud_wavenet_f.mp3
Google Cloud Wavenet, neural voice US / female / F
Seems to be a repackaged neural voice, so better check the pricing
Listnr Jenny - tts_web_listnr_jenny.mp3
Another repackaged neural voice
Natural Reader Jane - tts_totalrecorder_web_naturalreader_jane.mp3
Again this seems to be a repackaged neural voice, with maybe a touch of SML
Microsoft Azure / Immersive Reader - tts_totalrecorder_win11_immersive_natural_aria.mp3
This is the Aria voice from Immersive Reader, but it appears the same as Microsoft Azure TTS
Speechify Gwynneth - tts_totalrecorder_web_speechify_gwynneth.mp3
Our favorite Pepper Pots :-) Speechify has some good voices but they often sound a little flat... unfortunately their aggressive marketing would be a reason to skip them
Lovo Megan - tts_web_lovo_megan.mp3
This one's different and fairly good. Unfortunately, those prices are killing me
Eleven Labs Bella - tts_web_elevenlabs_premade_bella.mp3
One of the better ones, but expensive. I think it's speeding up when it hits a single or double quote
More
No comments:
Post a Comment