I've been working for a while on a node that utilises the excellent and very compact SVOX Pico Text-to-Speech library to turn a Node-RED string into a WAV file or PCM buffer, or for immediate playback on the host machine. To communicate with the Pico TTS engine the node uses the nanotts C++ API wrapper written by Gregory Naughton, which is also what gives the node its name. The node can handle strings in US & UK English, German, French, Spanish and Italian, and the algorithm can be tweaked by supplying values for speed and pitch.
The strings sent to the node can include Pico TTS markup tags, which include
<pitch level="..."> ... </pitch>
Sets the pitch level for the enclosed block.
<speed level="..."> ... </speed>
Sets the speed level for the enclosed block.
<volume level="..."> ... </volume>
Sets the volume level for the enclosed block.
<break time="..."/>
Inserts a pause with the duration specified by the time parameter (e.g. "1s" or "1000ms").
<ignore> ... </ignore>
Completely ignores the enclosed block (it will not be read out).
<phoneme ph="..."/>
Provides a phonemic or phonetic pronunciation for a word to be inserted into the text in the place of the markup. The value of ph should use the X-SAMPA phonetic alphabet to define the phoneme.
In the first form, this will play an audio file at the position where the tag appears. In the second form the audio file will play instead of the enclosed block of text.
Due to what looks like a bug in the nanotts library, this node is not yet functional with nanotts master branch - you need to modify and build your own copy as discussed in the linked issue. I am posting this here in the hope others will have enough interest in this node to collaborate on resolving this issue and help push it over the line.
Very cool, I recently built a flow using PicoTTS via the Exec node that can queue and playback messages as they come in... it uses the new 'Complete' node to manage the queue by watching the Exec node.
If you use chrome, there is also speech recognition available within the browser (local, using speechSynthesis and webkitSpeechRecognition, see documentation), there are some js libraries that make it a bit easier to implement, including tts.
Oh absolutely. The Pico TTS engine (as the name indicates) has been designed with very modest hardware requirements and will build & run just fine on ARM systems. I haven't tried but my guess would be you could run multiple simultaneous instances on a Pi 1.
Edit: If you want to give it a quick try, just install the libttspico-utils package which includes the pico2wave binary. Pretty sure this will be in the Raspbian repositories.
Hi, I finally managed to get my RPi 4 up and running and wanted to try NanoTTS.
It looks like the libttspico-utils used to be included in the older 'stretch' release of Raspbian but is no longer in the latest and greatest 'buster' release.
For anyone else wanting to try PicoTTS on Buster, there is a solution here:
Thanks, I did not know it had been removed, or that it was considered "non-free". AFAICT NanoTTS sources include all or at least a significant portion of the PicoTTS code, and this appears to have an Apache v2 license attached. I looked at this a little bit because I'm considering ditching NanoTTS and doing a direct implementation of the PicoTTS C API in Node.JS. Not only would that get around the problem with command line argument parsing in PicoTTS, but it would remove the need to use child_process.exec. It should also make it easier to bundle it so NPM can install everything in one go (e.g. using node-gyp). I am slightly daunted by the task though...