What are the advantages of deep learning based speech synthesis/TTS systems compared to parametric/concatenative TTS?
submitted by
What are the advantages of deep learning based speech synthesis/TTS systems compared to parametric/concatenative TTS?
Much less manual work to implement and refine to achieve convincing results? On the flip side: Huge models, and comparatively much more computationally expensive to run.
The Bitter Lesson talks about speech recognition instead of synthesis, but I would guess that it’s a similar dynamic:
Also posted over in !discuss@discuss.online here, since I was reminded of the essay
Uh, they sound much better