Ways to control Node-RED with speech commands

I got to do some testing of verbal range this morning, my first real tests. Used NR with MQTT-in nodes and debug out connected to hotword detection as well as ASR and natural language understanding. Noisy environment, network cables were being placed in the room, the room itself still being mostly empty so lots of echoing too. Got hotword detection up to 5.5-6 metres at a softer-than-conversational volume, and correct ASR and NLU processing with a badly trained model up to 4 metres this way. Several of my tests only had a couple key words of the text transcribed, closer towards it or speaking a but louder got better results. The microphone I used is benchmarked for voice detection up to 5 metres, so this was a pleasant surprise.
Processing time for ASR did increase with the distance, Iā€™ll do some benchmarking for that over the next couple days.

5 Likes

Any news?

They're presently not considering diversifying their business model for non-commercial DIY use cases.

Hardly surprising I guess. Snips it is then.

Did you end up doing any more testing?

No, sadly I did not because of my usual excuse, I got sick again... I started to set up a benchmarking flow, but had to quit halfway the training (writing the sentences) to do the benchmarking with. Havenā€™t gone back to it yet.

Hi Lena,
I was recently investigating to integrate Snips into Node-RED, to have an offline voice recognition as you suggested. However I read today that Sonos has bought Snips, and it seems they will shut down the Snips Console in a about two months from now...

The Snips community is absolutely not amused by this announcement, since that console is required to train their local home system with custom sentences and intents. Which means they are now all screwed by Sonos. So I'm very glad at the moment that I haven't put to much effort into this integration.

But it was a nice suggestion from you anyway...
All other proposals are very welcome!!

4 Likes

Big yikes, hadn't heard that news yet :frowning:

Thank you Bart Iā€™ve been hopping in and out of sickness so I hadnā€™t heard anything yet. Iā€™m quite mad right now, partially for trusting the half-open-source project and feeling betrayed like that again, naive me getting tricked again. As Iā€™ve mentioned in a few topics before, I had hoped to use Snips to take back some control over my health again. Getting as many years of voice control out that Iā€™ve left before my voice slurs so much speech recognition stops working entirely. Iā€™m not comfortable to write google assistants or alexa skills with medical data inside, so Snips sounded like a perfect solution with injectable data.

Since hearing the news Iā€™m rapidly looking for alternatives. Saw Rhasspy linked on the HA forums, https://community.home-assistant.io/t/rhasspy-offline-voice-assistant-toolkit/60862, going to test it for my purposes ASAP. Spacy (python NLP library that does high speed processing) is actually really good, and I loved working with it last year. The Dutch corpus inside is good too, but I donā€™t know how well it would work for my use cases.

Meanwhile Mozilla have been steadily working on DeepSpeech - ow moving it to TensorflowLite https://discourse.mozilla.org/t/tensorflow-lite-inference/31903
so should be within range of a Pi / Android device soon.

4 Likes

Following this article, indeed Lite is currently supported by deepspeech. But cannot find anything about support for Dutch, so that would be a no-go for me personally :worried:

But anybody English-speaking that wants to do a pull-request on my node-red-contrib-deepspeech node, be my guest!

Not sure if this is of interest, but installed the Rhasspy Toolkit on an RPi 3B+.

Not fully tested it yet as I am awaiting the delivery of PS Eye 3 Camera (using the microphone). Works well on the TTS side (using picoTTS) and seems to be very configurable and once I get the microphone, I can look to see what the JSON string looks like to start developing Node Red command to drive my Opto22 PAC R1 Controller.

You can integrate your own preferred 'engines' using the custom command feature should you wish.

I may be misunderstanding some stuff as this is all so new to me (including using Docker!!). But I reckon it might be worth a few moments of your time to explore it's possibilities as PocketSphinx appears to have a Dutch language capabilities. (Some nice tables showing compatabilities.)

A new forum has been set up in the last few days, and some of the people from the Home Assistant site seem to be very keen on the project.

1 Like

Hi @mudwalker,
Do you know whether you can use this via a smartphone. I mean to have wake-word detection (e.g. "hey rhasspy" or something like that) on a smartphone, and then all the other recognition on yor raspberry? So similar to google and amazon do it ...
Thanks!
Bart

Just don't ask it to tell a joke, it may be German. :rofl:

(in reference to a particular German car manufacturer).

1 Like

Hi @BartButenaers,
Looks like it can be used in this mode. One as Client/Server (installation example through Docker), the other using Hermes Audio Server, written to do such a thing.

The first link shows both methods.

I am needing to try to walk before I start trying to run with this. :sweat_smile: Two weeks ago, didn't even consider Voice Control. I rely on the contributions made to the various forums by persons with far more knowledge about these things than me to enable me to explore options! To these people I say a BIG thank you.

1 Like

Hello,
i actually build a whole voice assistant just using nodered and two very simple python scripts. One for hotword detection and one for asr.
I use a small python script utilizing pocketsphinx python with a custom dictionary and language model. This works very well for german in my case. you also need an accoustic model but they have one here:
https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
As long as you have a way to get the audio from the phone to your nodered server your approach could work.
Johannes

Hi @JGKK,
Thanks for joining this discussion, and sharing your experiences!!

Last week I announce a beta version of my node-red-contrib-ui-media-recorder. I want to add also microphone capture to that node (based on the MediaRecorder API), which could help us for this purpose...

Very interesting! But unfortunately this is way out of my comfort zone. :woozy_face:
If you would find some spare time to explain - in dummy language - how you have accomplished this, that would be appreciated! This forum has a "share your project" category that would be ideal :wink:

Such a wake-word detection baked into my node would be an awesome feature...

please @JGKK give me a tutorial :wink:

I have been thinking about one for a while. Once I have some time I will try to write one. But unfortunately I canā€™t say when as Iā€™m not really good at writing tutorials and it would take quite some effort.

2 Likes

@JGKK even if you just provided the flow and scripts that would be sharing your work with others and helping them along :slightly_smiling_face:

3 Likes

A lot has changed and I think I found the perfect speech companion to nodered:


This is an opensource project by the same person that also makes the Rhassphy assistant project (https://rhasspy.readthedocs.io/en/latest/).
It includes most of the core features of the later but in a stripped down version as a commandline tool.
Installation couldnā€™t be any easier as their are prebuilt deb packages for download. Their is already support for many languages and all you have to do is download a profile to get one of them.
Many of the languages support kaldi models by the great https://zamia.org/ project. Normally its a huge pain to try to adapt a kaldi model to your own domain specific language model but voice2json takes all this away and makes it a really easy and straightforward process.
As Kaldi can achieve sub realtime performance on a raspberry pi 4 especially if its a little bit overclocked and is way better accuracy wise than pockesphinx this is awesome news.
You can create your intents by writing them in a very easy to understand template language that is based on the jsgf grammar format.
It took me less than two hours to move all my intents I had in my pocketsphinx language model to this template language.
Im amazed how well this tool works out of the box.
The best part it does not only do speech to text but also includes a tool for intent recognition that will parse the intent out of your command. Because its all commandline based you can easily integrate it using the exec node and as the name suggest it outputs all results in JSON format so its very easy to work with in Nodered.
The documentation is great.
So to round it up I would say this is the easiest to use and install fully offline linux speech to text/intent solution I have used and I recommend everybody go try it.

Stay healthy, Johannes

6 Likes