Hello,
As I have mentioned in another thread Mozilla has ceased any real funding or development of its deepspeech speech to text project.
Fortunately most of the original core development team has stepped up to this challenge and founded their own venture to continue development of a fork under a new name:
Today I published the corresponding nodes for this:
In this initial release they act as a slightly improved drop in replacement for the deepspeech nodes.
Coqui is right now compatible with the current language models and scorers used in deepspeech but this will change in the future as coqui is in active development.
This also means I will as of now no longer maintain the deepspeech nodes and solely focus on the new coqui nodes which may also mean that at one point they won't work as a drop in replacement for the current deepspeech nodes anymore.
So I recommend that anybody using the deepspeech nodes right now should change over to the coqui nodes soon.
You can find all the available languages for coqui on their website:
As always I'm open for feedback, request and bug reports on the nodes repository:
Hi Johannes,
Thanks for explaining the relation between Deepspeech and Coqui!
Did a quick search about which languages are supported, but couldn't find it. Do you know where I can find the language list, which most probably won't contain Dutch...
I think there is a Dutch model but it's not very good judging from the reported word error rate. It looks like it's more of a proof of concept than anything else.
I have two right now that require different amounts of effort:
1st is node-red-contrib-personal-wake-word:
This allows for very quick creation of a wake word that will work for one person in most circumstances. Just record 3-5 samples of you saying the wakeword and create a configuration from those files and your ready to go.
2nd would be node-red-contrib-precise-wakeword:
where you can either use a pre-trained model like hey mycroft or train your own universal wake-word model which will be much more robust and noise resilient than the first option. Unfortunately this process is quite involved as with all machine learning models you will need a lot of samples to train on (hundreds) and the precise training tools are a little bit buggy, not that user friendly and not very well documented. There is helpful links in the documentation though should you want to embark on this endeavor.
ok thanx again for your effort!
wake word works ok to start the listening of coqui. To make everything useable for flow interaction we need some stuff like silent detection or so to stop the listener. (or another solution. the word parser (don´t remember the name for the one you did for voice2json) could be helpful if it works with the result / text coming form coqui. Does that work ?
This is all the nodes and subflows i wrote or contributed too that you would need to build a voice assistant completely in nodered which is what i have done:
jsgf permutation subflow to create a corpus both for training a language model for coqui and a tagged version for intent matching with fuzzywuzzy
sox for audio input and output
precise or personal-wake-word for wake-word spotting
node-red-contrib-vad for silence detection when a command is spoken
coqui stt for speech to text
node-red-contrib-fuzzywuzzy to match the coqui output with an intent based on the tagged corpus from the jsgf parser
node-red-contrib-pico2wave for text to speech output
Once i find some time this winter i plan on writing a tutorial based on a simple example on how to build one.