[Announce] node-red-contrib-coqui-stt (initial release)

Hello,
As I have mentioned in another thread Mozilla has ceased any real funding or development of its deepspeech speech to text project.
Fortunately most of the original core development team has stepped up to this challenge and founded their own venture to continue development of a fork under a new name:

Today I published the corresponding nodes for this:

In this initial release they act as a slightly improved drop in replacement for the deepspeech nodes.
Coqui is right now compatible with the current language models and scorers used in deepspeech but this will change in the future as coqui is in active development.
This also means I will as of now no longer maintain the deepspeech nodes and solely focus on the new coqui nodes which may also mean that at one point they won't work as a drop in replacement for the current deepspeech nodes anymore.
So I recommend that anybody using the deepspeech nodes right now should change over to the coqui nodes soon.
You can find all the available languages for coqui on their website:

As always I'm open for feedback, request and bug reports on the nodes repository:

Johannes

6 Likes

Hi Johannes,
Thanks for explaining the relation between Deepspeech and Coqui!
Did a quick search about which languages are supported, but couldn't find it. Do you know where I can find the language list, which most probably won't contain Dutch...

1 Like

Have a look in the second to last link I posted:

I think there is a Dutch model but it's not very good judging from the reported word error rate. It looks like it's more of a proof of concept than anything else.

3 Likes

Thank you! Works like charm !
Did you have a solution for a wake word ?

I have two right now that require different amounts of effort:

  • 1st is node-red-contrib-personal-wake-word:

This allows for very quick creation of a wake word that will work for one person in most circumstances. Just record 3-5 samples of you saying the wakeword and create a configuration from those files and your ready to go.

  • 2nd would be node-red-contrib-precise-wakeword:

where you can either use a pre-trained model like hey mycroft or train your own universal wake-word model which will be much more robust and noise resilient than the first option. Unfortunately this process is quite involved as with all machine learning models you will need a lot of samples to train on (hundreds) and the precise training tools are a little bit buggy, not that user friendly and not very well documented. There is helpful links in the documentation though should you want to embark on this endeavor.

Johannes

3 Likes

ok thanx again for your effort!
wake word works ok to start the listening of coqui. To make everything useable for flow interaction we need some stuff like silent detection or so to stop the listener. (or another solution. the word parser (don´t remember the name for the one you did for voice2json) could be helpful if it works with the result / text coming form coqui. Does that work ?

Have a look here:

https://flows.nodered.org/collection/Qn4a6AEtnjAw

This is all the nodes and subflows i wrote or contributed too that you would need to build a voice assistant completely in nodered which is what i have done:

  • jsgf permutation subflow to create a corpus both for training a language model for coqui and a tagged version for intent matching with fuzzywuzzy

  • sox for audio input and output

  • precise or personal-wake-word for wake-word spotting

  • node-red-contrib-vad for silence detection when a command is spoken

  • coqui stt for speech to text

  • node-red-contrib-fuzzywuzzy to match the coqui output with an intent based on the tagged corpus from the jsgf parser

  • node-red-contrib-pico2wave for text to speech output

Once i find some time this winter i plan on writing a tutorial based on a simple example on how to build one.

2 Likes