Ways to control Node-RED with speech commands

Deepspeech is quite slow on a pi right now compared to kaldi as it will only run on a single core. The other problem is that many of the models available for languages other than english are not compatible with a pi as deepspeech works with different models for different tensorflow implementations. There is also the problem that deepspeech is in such a state of development that right now every new version breaks the compatibility with models generated for the previous version
Im sure deepspeech will improve hugely as it progresses forward towards an 1.0 release but right now its just not the best on hardware like a pi.
Depending on your language either kaldi or pocketsphinx are still your best bet.
But all those solutions including deepspeech work best and fastest on hardware like sbcs if you limit their vocabulary and language model to do domain specific transcription. Unfortunately that is where the pain starts as you have to start compiling tools like ken lm and writing your own phonetic dictionaries to create the language models and so on. I ve been down this path and its very frustrating at times.
This is why I recommend using things like voice2json or rhasspy as they do all that for you. They even give you the choice to use any of the above systems in the backend including deepspeech. That’s their greatest feature that they abstract away most of the low level pain associated with open source stt systems as most of them were developed by scientists for scientists and never intended for noobs like us.
Unfortunately their is not the one true great solution out their especially for open ended transcription on such limited hardware like most people run nodered on.
For me right now kaldi strikes the best balance between speed, accuracy, available languages and being finally easy to use with tools like voice2json.

Johannes

4 Likes