I've been working for a while on a node that utilises the excellent and very compact SVOX Pico Text-to-Speech library to turn a Node-RED string into a WAV file or PCM buffer, or for immediate playback on the host machine. To communicate with the Pico TTS engine the node uses the
nanotts C++ API wrapper written by Gregory Naughton, which is also what gives the node its name. The node can handle strings in US & UK English, German, French, Spanish and Italian, and the algorithm can be tweaked by supplying values for
Some examples of how this can be used:
The strings sent to the node can include Pico TTS markup tags, which include
<pitch level="..."> ... </pitch>Sets the pitch level for the enclosed block.
<speed level="..."> ... </speed>Sets the speed level for the enclosed block.
<volume level="..."> ... </volume>Sets the volume level for the enclosed block.
<break time="..."/>Inserts a pause with the duration specified by the
timeparameter (e.g. "1s" or "1000ms").
<ignore> ... </ignore>Completely ignores the enclosed block (it will not be read out).
<phoneme ph="..."/>Provides a phonemic or phonetic pronunciation for a word to be inserted into the text in the place of the markup. The value of
phshould use the X-SAMPA phonetic alphabet to define the phoneme.
<play file="..."/> | <play file="..."> ... </play>In the first form, this will play an audio file at the position where the tag appears. In the second form the audio file will play instead of the enclosed block of text.
Due to what looks like a bug in the
nanotts library, this node is not yet functional with
nanotts master branch - you need to modify and build your own copy as discussed in the linked issue. I am posting this here in the hope others will have enough interest in this node to collaborate on resolving this issue and help push it over the line.