Ways to control Node-RED with speech commands

Yep, now you see I was right! Have been doing further examination, and it seems to me there are 2 options:

  • Google Home device with a female voice, which is always listening (even when you don't want)
  • Google Home device with a male voice, which is never listening (even when you need it)

Yes, but perhaps Google Search Engine doesn't show us negative articles about their devices. Now I'm really getting paranoid :rofl:

2 Likes

The real issue I have with cloud based speech recognition is that my Internet connection is essentially a bit of wet string strung between some trees. It's not unusual to be without any service for a week due to storms,water damage, lightning strikes, fallen trees, etc. It's pretty annoying not being able to turn the lights off because the Internet is down.

Hey Dean, that was also one of the reasons why I wanted to have DeepSpeech integrated in Node-RED. However when you have a look at this recent issue, you will see that it is currently still too heavy to run on a Raspberry Pi 3 ...

Well, if I had that poor an Internet connection, I would definitely be setting up something a bit more meaty than just a Pi. For many years before I got a NAS and then a Pi or 2, I ran a deskside AMD based system that was on 24x7. With modern processors being what they are, there is no reason not to run a PC as a home server.

That is true ...
For completeness: I found one device that will run DeepSpeech out-of-the-box, for users that don't want to go through the installation process. The device is called Mycroft Mark 2. It will be released this month, so don't know if it works well. And it has a rather high price of 189$. This is how it will look like:

image

Well, I do run a 4 core 3.7GHz Xeon E3 with 16Gb RAM and 6Tb HDD as my home server :slight_smile:

1 Like

I'm not a hardware specialist but now I'm a little confused by reading this reaction from Mozilla:

As documented, if you use our prebuilt binaries, you need some CPU with at least AVX instructions set. Also, as documented, we have binaries for Linux/AMD64, OSX/AMD64, and some ARM (strictly RPi3B ) and ARM64 systems (should run on any Debian Stretch aarch64 distro, tested on Le Potato board).

Since they talk about RPi3B, I would expect it to run on a Raspberry Pi 3 Model B? So I executed npm install deepspeech which runs fine:

image

Just what I needed for a new contribution, since my users don't have to worry about building binaries themselves. So I started a basic node-red-contrib-deepspeech node which includes this code:

 module.exports = function(RED) {
    var settings = RED.settings;
    var deepSpeach = require('deepspeech');

    function DeepSpeechNode(config) {
        RED.nodes.createNode(this, config);
 
        var node = this;
        
        deepSpeach.printVersions();
        
        node.on("input", function(msg) {
             //node.send({payload:value});
        });
    }

    RED.nodes.registerType("deep-speech",DeepSpeechNode);
}

But when I start Node-RED afterwards, the require('deepspeech') results in a warning:

8 Dec 21:09:56 - [info] Node-RED version: v0.19.4
8 Dec 21:09:56 - [info] Node.js version: v7.9.0
8 Dec 21:09:56 - [info] Linux 4.14.71-v7+ arm LE
8 Dec 21:09:57 - [info] Loading palette nodes
meSpeak: Not in browser Only able to do raw
8 Dec 21:10:17 - [info] Dashboard version 2.10.2-beta started at /ui
8 Dec 21:10:17 - [warn] ------------------------------------------------------
8 Dec 21:10:17 - [warn] [node-red/deep_speech] Error: /lib/arm-linux-gnueabihf/libm.so.6: version `GLIBC_2.23' not found (required by /home/pi/.node-red/node_modules/deepspeech/lib/binding/v0.3.0/linux-arm/node-v51/../libdeepspeech.so)

I have no clue whether this is because I'm running Raspbian Jessie. Perhaps I should upgrade to Raspbian Stretch? No idea...

When they say

(should run on any Debian Stretch aarch64 distro, tested on Le Potato board).

That may be a clue to Debian level they tested with

Heuh, I haven't seen that sentence. Thanks for supporting my old eyes ...
[EDIT] Seems I have to upgrade from Jessie to Stretch:

image

I highly recommend getting a new sd card and starting with fresh image than upgrading

1 Like

BTW I have create a very basic repository node-red-contrib-deepspeech in case I can find somewhere some time to experiment with DeepSpeech. All proposals (and of course pull requests) are welcome, since my time is rather limited...

1 Like

@cymplecy, @dceejay,
That indeed did the trick... I have now installed (from scratch) the latest Raspbian Stretch Lite and now I have no warnings anymore at Node-RED startup. And when my node-red-contrib-deepspeech is loaded, it nicely displays both version numbers:

TensorFlow: v1.11.0-9-g97d851f
DeepSpeech: v0.3.0-0-gef6b5bd

But I assume this was the easiest part, since I have no knowledge about AI ...

P.S. My Putty session has been disconnected 3 times during the npm install deepspeech. The last time I executed the command, it seemed to be successfull:

pi@raspberrypi-deepspeach:~ $ npm install deepspeech
> deepspeech@0.3.0 install /home/pi/node_modules/deepspeech
> node-pre-gyp install --fallback-to-build
[deepspeech] Success: "/home/pi/node_modules/deepspeech/lib/binding/v0.3.0/linux-arm/node-v64/deepspeech.node" already installed
Pass --update-binary to reinstall or --build-from-source to recompile
npm WARN saveError ENOENT: no such file or directory, open '/home/pi/package.json'
npm notice created a lockfile as package-lock.json. You should commit this file.
npm WARN enoent ENOENT: no such file or directory, open '/home/pi/package.json'
npm WARN pi No description
npm WARN pi No repository field.
npm WARN pi No README data
npm WARN pi No license field.
+ deepspeech@0.3.0
added 131 packages from 92 contributors and audited 285 packages in 69.309s
found 8 moderate severity vulnerabilities
  run `npm audit fix` to fix them, or `npm audit` for details

Don't know why it disconnected nor why it went ok last time. Could it be by the large number (131) of packages being used, or the total size, or does an installation continues where the previous installation ended, or ...

But don't like that, because I don't want my users to have the same issue...

If you have having problems maintaining a connection, you might want to check out byobu or at least screen. These will maintain sessions for you even if your client disconnects.

Julian, it should also work painless with 'npm install'. Because what happens if a user installs this node via his palette in the flow editor, and the whole process interrupts over and over again ...

That's OK because the process is being run by Node-RED itself, on the remote end - so it would only be an issue if Node-RED crashed part way through which is very unlikely.

When you run something from a remote SSH BASH shell, if the SSH connection drops part way through, the command it is running is terminated - otherwise, you'd end up with disconnected processes on the far end.

Byobu and Screen run a service on the far end that the shell actually runs in. So if the connection terminates, you can reconnect to the service later and the command will still be running or will have finished normally. The output is also saved in the service so you can see everything that went on. There are other useful features too such as split screen terminals though this is less useful these days since you can simply run as many SSH connections as you like.

1 Like

Hi guys.. For now (and because I have it) I am using Google Home with online IFTTT services. One can very easy make simple phrase and IFTTT can trigger e.g. some HTTP request to my local node-red.

Pro:

  • Google Home is nice and cheap, IFTTT is very easy to configure
  • It is working like "hey google blinds down" (my homemade arduino blinds) or "hey google channel 15" (blackbean ir blaster to my tv)
  • There is no need to use additional phrase like "hey google talk to my nodered blinds down"
  • It can be used together with Google Assistant (node-red nora module) and commands like "hey google turn on lights in the kitchen" or "hey google turn of fan in the bathroom"

Cons:

  • Only online
  • Security concerns (big brother)?
  • It is not possible to detect google hotwords and e.g. lower volume of my TV when I say "hey google"
  • Google Assistant has highest priority so it is not possible to use all phrases
1 Like

Bringing this topic back: Iā€™m using a different setup, after looking at the same problems as others mentioned regarding the level of trust in third parties. I have a pi 3 B+ with a connected microphone running the snips.ai platform. Snips is set up similarly to something like Google Actions or Alexa skills in that you set up the type of recognition yourself. A model is then trained in their cloud, which you download to your device. Actions you can either pick already made by others or code from scratch. It supports JavaScript and Python out-of-the-box, and has libraries available for iOS and android development. Once it is downloaded to your device, no internet connection is needed at all (unless your action needs it), and the speech analysis is done locally.

Best part? It uses MQTT internally, and requires a broker to work, either locally on the device or elsewhere in the network. To get access to the commands used, your action code has to subscribe to specific MQTT topics, and publish to others. It also has support for dialogue style conversations until it has all information it needs. To connect it with NR, all you need is the action (when it has all needed information) to publish to an MQTT topic, and have NR subscribe to that.
Now that I think of it, you can even use NR to write the actions by just subscribing/publishing to the snips topics directly. Even the ā€œtalking back to the userā€ is done through publishing to MQTT.

I just realised Iā€™m wearing woollen clothes today so Iā€™m not going to take risks in setting it up for pictures but when I got the hardware a couple weeks ago I recorded a demo video for myself while testing the range of the microphone, Iā€™ll upload a part of that later.

I picked a 70 euros microphone, not the cheapest one capable of connecting but because of the supposed 5 metres radius for voice commands. I verified 4+ metres already, and another 3.5 metres through a badly isolated outer wall with the door open but a party next door.
My hardware setup for the smart speaker:

  • Raspberry Pi 3 B+
  • Seeed Studioā€™s ReSpeaker mic array v2 (4 microphone-mic array, connects to the pi over USB), 70 euros at a local electronics store, can be ordered from China for 69 dollars if interested.
  • class 10 micro sd in the pi
    For the speakers Iā€™m currently using an old (20+ years and still working just fine :P) set of Trust computer speakers. The respeaker has a 3.5mm headphone jacket, and it comes without the static noise the pi has. Microphone audio from metres away is as clear as if it was said next to the mic array.

But, the snips platform doesnā€™t care about the hardware, if you want a 15 dollar microphone hat on your pi it works just as well, or a 5 dollar PS3 eye microphone apparently (according to their own benchmarks)

On the topic of listening along... when inspecting with MQTT explorer I found a topic it publishes the raw wav streams of everything the microphone picks up. Might be useful for your own purposes, it is used internally for processing your voice, but not being send elsewhere. Just remember to secure your broker

1 Like

Hi Lena (@afelix),

Interesting! If you have ever time to write a tutorial, please don't hesitate :wink:
P.S. Have been looking for a small & cheap Raspberry microphone, but haven't found one with acceptable quality...

I have the time (too much of it actually) and ability to write tutorials and connecting documentation between NR and Snips. I was already planning to post about it later once Iā€™ve everything set up, but thatā€™s going to take a while :slight_smile:

In my set up, one of the questions was on how to input sensitive (health/medical) data into a local database, where I can stay in control. Simple situations, like ordering refills of medication are less simple when taking in mind the sensitivity of the input data: names of medication, dosages and amount are excellent for profiling users. Or keeping track of my pain levels during the day (right now that would be something like ā€œstabbing pain in my left leg, as if metal pins are inserted, followed by flashes of cold throughout the foot, pain level about a 7ā€) and getting that parsed and again stored locally. Not the kind of information I want to get parsed through google or amazon.

Itā€™s also why Iā€™m making everything MQTT based, and my own hardware projects are Espressif based for the wifi and thus MQTT publishing support. NR to manage the flow of the communication, MQTT as the main communication type.

Another plus for Snips: it can be trained to use another hotword, and it can have satellite stations in other rooms to listen for requests as well, then forward this (over MQTT) to the base station for processing. My choice for this expensive microphone array is based on the assumption Iā€™m likely going to end up in a small apartment where the 5 metre radius means I can reach everywhere with it.

2 Likes

Is there any functionality that allows speech to text?