Monday, April 3, 2023

TTS (Text to Speech) Examples - The Sequel

Writing / Datatalk / TTS.

There are many TTS engines, web interfaces, and programs out there. But... some are quite expensive, others have poor voices, and you can't always be sure of what's 'in the tin', because they won't give you a free trial first.

I don't have the money nor the time to do an extensive review (hey, someone willing to sponsor me? 😁) but I needed to do *some* investigation for one of my (writing-related) projects.

Read on for the results, and the samples I recorded.


First conclusions

So I Googled and browsed and downloaded, and came to the following conclusion:

  • Some services are simply repackaged Azure, Amazon, or Google services, where they charge you much more than the original providers
  • Eleven Labs is the best, and very expensive (I suspect they do some 'quote detection', and they claim to look for larger-volume context)
  • Amazon Polly is (thus far) the most affordable option for non-free, neural voices.
  • You can access Azure for free (sort of) using the HTML to Immersive Reader path (for as long as Microsoft will let you).
  • Google Wavenet seems to be the worst of the big three (Google, Amazon, Azure)
  • For short TTS checks, (for example an author checking out his or her own work using TTS) the regular ree and built-in SAPI5 voices will do
  • TTS would work a lot better if it would be able to detect dialogue, either using a neural network or simply detecting single and double quotes (Lovo, Eleven)
  • Even if Speechify would be great, I'd hesitate to recommend it as they are a tad too pushy with their website and presentation


Test setup

1. I used the following text to test the different engines:

It's painted bright white and stands out like a beacon between all the, uh, well seasoned buildings we've seen so far. 'Well,' she says, 'that is quite the surprise. Now tell me, how do we continue from here?' 

2. When using web-based services I tried to download their generated samples. If their website wouldn't let me I used Total Recorder to grab the audio stream.

3. After downloading / recording all voice samples I normalized all voices to 95% volume, and converted any .WAV to .MP3.


Results and samples


Windows SAPI5 - tts_balabolka_win11_sapi5_zira.mp3

Default Windows voice, good enough for simple chapter checks


Amazon Polly Ruth - tts_cli_amazon_neural_ruth.mp3

Cheapest acceptable neural voice

Amazon Polly Joanna Neural  - tts_cli_amazon_neural_joanna.mp3

Another neural voice

Amazon Polly Joanna - tts_cli_amazon_joanna.mp3

Same voice, non-neural


Google 1 - tts_balabolka_google1.mp3

Retrieved via Balabolka, older voice and meh

Google 2 - tts_balabolka_google2.mp3

Retrieved via Balabolka, older voice and meh

Google Standard - tts_balabolka_googlecloud_standard.mp3

Standard non-neural Google Cloud voice

Google Neural E - tts_balabolka_googlecloud_wavenet_e.mp3

Google Cloud Wavenet, neural voice US / female / E

Google Neural F - tts_balabolka_googlecloud_wavenet_f.mp3

Google Cloud Wavenet, neural voice US / female / F


Seems to be a repackaged neural voice, so better check the pricing

Listnr Jenny - tts_web_listnr_jenny.mp3

Another repackaged neural voice

Natural Reader Jane - tts_totalrecorder_web_naturalreader_jane.mp3

Again this seems to be a repackaged neural voice, with maybe a touch of SML


Microsoft Azure / Immersive Reader - tts_totalrecorder_win11_immersive_natural_aria.mp3

This is the Aria voice from Immersive Reader, but it appears the same as Microsoft Azure TTS


Speechify Gwynneth - tts_totalrecorder_web_speechify_gwynneth.mp3

Our favorite Pepper Pots :-) Speechify has some good voices but they often sound a little flat... unfortunately their aggressive marketing would be a reason to skip them


Lovo Megan - tts_web_lovo_megan.mp3

This one's different and fairly good. Unfortunately, those prices are killing me

Eleven Labs Bella - tts_web_elevenlabs_premade_bella.mp3

One of the better ones, but expensive. I think it's speeding up when it hits a single or double quote


More


No comments:

Post a Comment