Ways to control Node-RED with speech commands

Cant you run the official docker image for voice2json in parallel? Look at the part of the documentation of voice2json about using it with mqtt http://voice2json.org/recipes.html#create-an-mqtt-transcription-service
Maybe you can use something like this to communicate between the two docker containers.
I have no experience with running nodered in a docker unfortunately.

I thought about that and I think I'll go for that approach for now. I liked the idea of using exec nodes to control the voice2json commands and that is not going to work that way.
Still, it might be the easiest solution for now.
But as soon as your nodes are ready I'll have to find a solution. Probably building a custom ubuntu docker image for node-red will be the way to go...
Thank you for the input!

1 Like

Just as a follow-up to the point above: Having voice2json outside of my node-red docker was not worth the effort in the end. Having it at my fingertips in the flows is actually part of the appeal as far as I'm concerned. It makes life much easier.
I therefore built a custom docker image not based on alpine but Debian, which makes it easiest to use pre-compiled .deb-files etc.
I'm wondering whether it may help anybody, if so, you can find my first draft here (including the voice2json installation, which is a little rough, maybe I'll split that into an own Dockerfile later, first I want to have a voice assistant up and running :wink: ):

1 Like

FYI
voice2json now defaults to using mycroft-precise as itā€™s wakeword engine. I had good success training my own model with the instructions on there github and a bit of trial and error.
Iā€™d be happy to help anybody here if they have questions about the process.
Right now this works best with the deb package of voice2json.
And @Steve-Mcl i will implement a hotword node for this.
Just thought iā€™d let you all know now that snowboy is closing down at the end of the year.
Johannes

5 Likes

I have been using a node-red skill for Mycroft and it works great for me.

I love the development work they are doing and Im a big fan of precise the wakeword component they have been building but right now to have a good experience you have to use it with googles api for stt which is a big no no for me.
I know they are working on having deepspeech be their primary stt engine but right now deepspeech is slow on a pi as it runs single threaded and their just isnā€™t many good models for languages other than english.
I dont want to take away from the great project it is, i just want to point out that its not a completely offline solution for now.
Johannes

2 Likes

Hi Johannes ( @JGKK ),
thank you for instructions. I am unsure if I should wait for the solution you are working on or start to get into voce2json myself. Is there a ETA for the wrapper you are working on ?
GrĆ¼ĆŸe aus Niedersachsen :wink:

Hello,
I canā€™t give an eta right now unfortunately. Voice2json itself is in the Transition to Version 2.0.0 which is in Beta right now. Version 2.0 brings many breaking changes not just to the underlying libraries but also to core functionality. Im in active talks with Mike and testing the Beta release out right now but there is still some things which are broken atm. For example from 2.0.0 onwards the used wakeword system is precise and not Porcupine anymore but this is not working right now in some scenarios involving Docker and some issues involving piping from and writing to sdtin from node.js which unfortunately also until fixed impacts using it with the exec node.
Mike is adding quite a bit of functionality that we will use in our wrapper nodes so im kind of waiting for those to work properly and the fixes to arrive.
All this said definitely start playing with it now and especially report bugs if you find them. Just be aware its in a state of flux right now but alot of work is beeing done towards the final release soon. I would recommend using the beta as the docs already reflect the 2.0 functionality and 1.x profiles wont work in Version 2.
And all the profiles and commands you build now to use with the commandline will work with the wrapper nodes once they are done.

Johannes

1 Like

thanks for the quick and detailed answer. if i find the time to try things out, i will follow the advice and use the beta version. (and report bugs). however... i hope you will keep us / me informed about future news in this thread.

Have a look here:

This node i made has some features which are made to work especially well with tools like voice2json and our upcoming wrappers for it. But it should in general be a good tool for anybody who wants to record speech commands on linux and use them in nodered.
Iā€™m happy for any feedback as this is still very much a beta version and work in progress.

Johannes

3 Likes

I was browsing github and came across "Olivia" Looks really nice, i am not entirely sure how the data is being processed, but it uses a neural network, could potentially be useful ?

1 Like

Im wondering what they built their stt on or what they use for that component. Canā€™t find anything about this part on a first glance. All the info is about the neural network they use for nlu but not the stt part.

Edit on further inspection they just do the nlu part themselves so its really more a chatbot and not a complete pipeline for voice processing.

Yeah you are right, the STT part should still be handled somewhere else. I tried to install the deepspeech project from mozilla, which works offline, but what i find along the way is that all these audio to text projects are quite hard to deal with :')

Someone created a tutorial for a pi to measure performance.

Deepspeech is quite slow on a pi right now compared to kaldi as it will only run on a single core. The other problem is that many of the models available for languages other than english are not compatible with a pi as deepspeech works with different models for different tensorflow implementations. There is also the problem that deepspeech is in such a state of development that right now every new version breaks the compatibility with models generated for the previous version
Im sure deepspeech will improve hugely as it progresses forward towards an 1.0 release but right now its just not the best on hardware like a pi.
Depending on your language either kaldi or pocketsphinx are still your best bet.
But all those solutions including deepspeech work best and fastest on hardware like sbcs if you limit their vocabulary and language model to do domain specific transcription. Unfortunately that is where the pain starts as you have to start compiling tools like ken lm and writing your own phonetic dictionaries to create the language models and so on. I ve been down this path and its very frustrating at times.
This is why I recommend using things like voice2json or rhasspy as they do all that for you. They even give you the choice to use any of the above systems in the backend including deepspeech. Thatā€™s their greatest feature that they abstract away most of the low level pain associated with open source stt systems as most of them were developed by scientists for scientists and never intended for noobs like us.
Unfortunately their is not the one true great solution out their especially for open ended transcription on such limited hardware like most people run nodered on.
For me right now kaldi strikes the best balance between speed, accuracy, available languages and being finally easy to use with tools like voice2json.

Johannes

4 Likes

@WhiteLion, @SymanK83. @mudwalker and everybody else it is finally time and you can try the beta of the voice2json nodes @BartButenaers and me have been building :raised_hands:
You can find all the information in this post:

or directly in the node-red-contrib-voice2json repository
I hope you like what we have so far.

Johannes

7 Likes

Anyone yet try Deepspeech on a Pi4? Faster processor and more memory, improve the performance?

I'm going to try to get a pi zero W to feed google home STT into GPT3 and have it feed the response back.
Thinking it could be cool to have a tuneable 'smart(er) assistant'.

So time ago there was a 'VoxCommando' solution for older Pi devices. Try Google, see if it still exists? Might be something you could leverage.

I recently published deepspeech nodes for nodered:

I find the performance quite good on a pi 4. Especially if using it in streaming mode where you do the inference as the audio is coming in and if you trained a domain specific scorer.

Hello,
If you want to add any type of wakeword like for example node-personal-wakeword i would really recommend to look at least at a pi 3a+ performance wise as the single core pi zero will give you headaches for those.
If you have a pi4 you could also look at offline solutions like deepspeech or voice2json.

Johannes