Many moons ago, I had a podcast that I recorded from my car during my commute. As a time saving measure, instead of correcting mistakes, or even worse, re-recording all or a portion of the podcast, I would use festival and the festival program, text2wave to have a computer generated, female voice, interrupt me and read aloud any corrections that were needed. She more or less became part of the show and was personified as Lynn. It was silly, and fun, but it solved a problem, worked for me, and in some way sort of reminded me of the fun my brother and sister had as kids back in the 70s using a tape cassette recorder and the heating ducts in our floor to broadcast our “radio station” from one room to the other.
Back then, 2005-2010-ish, getting festival to sound “good” required installing the CMU Artic SLT voices, or better yet the Nitech HTS voices, and also required compiling and installing the latest festival from source. It seems to be much easier now, at least with debian, to get a result almost as good as I was able to achieve back in 2005 since the festvox-us-slt-hts voices are in the repo and get me very close to where I was then. Simply installing the festvox-us-slt-hts deb was enough to pull in everything I needed to get things working.
Sadly, the Nitech HTS voices are still problematic, in fact even more so. In 2005 you had to have the latest version of festival and now in 2020 you have to have an old version since the Nitech HTS voices are not compatible with versions of festival greater than 2.1. Ain’t that a kick in the head, the more things change the more they stay the same.
None of this is probably new information, but its sort of news to me since I am re-visiting festival for the first time in 11 years. A lot has changed since then and commercial text to speech solutions are scary good nowadays; but if you want to use FOSS, festival still foots the bill.