and deepspeech will be automatically installed as a dependency.
The node uses deepspeech 0.9.3 or later. To do speech to text inference you need to download a model (tflite) and a scorer file. For example the official english or chinese model can be found on the release page.
You need to enter the path to both the model and the scorer in the nodes config.
To do inference then send a wav buffer (16000Hz, 16bit, mono) to the nodes input in the configured msg input property.
You will receive the transcription, input length and inference time as an object in the msg.payload or in your configured output property.
If you want to do more accurate and quicker transcriptions of a limited vocabulary and sentences set you will need to train your own scorer file. Documentation on how to do this can be found in the deepspeech readme.
There is a good model for deepspeech in german available here:
Well the sky is the limit really
This is just one component you would need in a nodered voice assistant compared to voice2json which offers everything in one package.
The thing is I wanted more flexibility and wanted everything to be more native and nodered integrated than voice2json could ever be in the end. (Not to say that voice2json is not a great piece of software as its totally awesome but i just like it the hard way)
So I looked at my awesome voice assistant pipeline flow chart that I made for voice2json
and decided to develop the individual components needed as nodered nodes and or subflows or to contribute to existing nodes like for example node-red-contrib-fuzzywuzzy to make them fit into my grand scheme of building a nearly native voice assistant with node-red / node.js.
You can see All I have done in that direction here:
So now I have completely node-red based voice assistant toolkit.
If I find the time I will actually write up a tutorial based on a simple example how to use all those tools to build one
Deepspeech fills the stt/asr part in that toolkit. I like deepspeech because its much simpler to train a domain specific language model and add new vocabulary to it than it is to do the same for kaldi/vosk. (External scorer scripts — DeepSpeech 0.9.3 documentation)
It also offers a native node.js api that offers streaming support. So no more python hacks as I really dont like python.
But keeping all this in mind I will actually cease development on the deepspeech node in the future as mozilla pretty much shelved the program and the outlook for future development is bleak.
But fortunately that is not the end of the story as most of the original developers forked deepspeech and are continuing development on the fork
This fork is called coqui and can be found here:
and I already have the node which will work as a drop in replacement for the deepspeech node ready:
Its not published yet as the npm support for arm64 and armhf (so raspberry pis) is missing right now but as soon as that arrives which should be soon I will publish the coqui nodes and they will take the place of deepspeech.
The models used for deepspeech and any scorers you train are compatible between the two.
I hope this sheds some light on my motivations, Johannes
Very much thanks for your detailed report. - And glad to see someone who dislikes python like me
If all things with coqui and your great work come together as planed I am getting really exited with that new upcoming possibilities, Jens.
You should experiment to use Dutch in your own home automation. I'm pretty sure your wife and children would ask you very friendly to activate German again as soon as possible
But now we are too much off-topic ..
Deepspeech already has Pi support its only coqui that’s missing it and as the deepspeech and coqui nodes and models are at this point in development interchangeable.
So if you want to play go ahead and install the deepspeech nodes and play with them on a pi because as soon as I will release the coqui nodes they will work as a nearly identical drop in replacement.