Although I have a very large todo list, I get distracted every time I see something interesting. So this forum is a real hell for me. I hate you guys with all your brilliant ideas and cool IOT stuff
Saw multiple discussions lately about voice control, so have read some basic articles about the topic. Below a list of possible solutions I found. If anybody has some other options, preferences, dislikes, or whatever 'constructive' feedback, please let me know ... Would be nice if interested users could find here a list of all possible setups.
Using dedicated hardware, which can a.o. do both text-to-speech and speech-to-text. If I understand correctly this are the two major competitors, with Google on top (e.g. due to larger number of languages supported):
Via the Node-RED dashboard app combined with native browser speech recognition functionality. Currently the dashboard's Audio-out node supports TTS (text-to-speech) which is supported widely by all major browsers. However STT (speech-to-text) is only supported by Chrome, and it needs a connection to their cloud platform (if I'm not mistaken). Would have been a simple solution: just talk in your microphone (of a wall mounted tablet running the dashboard), let the browser convert the speech to text, and send the text to the Node-RED flow. But don't seems to be a good idea ...
Via the Node-RED dashboard app running some thirth-party speech recognition software running local in the browser. There seems to be a lot of open source projects available, but not all well maintained. The following might (?) be worth looking at:
With speech recognition software running local in the Node-RED flow. In this case an audio stream (e.g. from the dashboard ...) is send to the Node-RED flow, where a speech recognition module is running. There seems again to be a lot of open source projects available:
- Deepspeech from Mozilla, which is based on neural networks in Tensorflow. Here is a demo. They have also created a website which allows everybody to contribute speech fragments to train the system in their own language. The more training data they can collect, the better it will become. The disadvantage I see is a non-automatic installation procedure, so not possible (I think) to install simply from the Node-RED manage palette.
- PocketSphinx.js: since NodeJS version >8 is able to execute Webassembly files, this module can also be run in the Node-RED flow...
By calling a cloud service like Google, Amazon, ... Seems that Google is unbeatable in quality, since they use deep neural networks with huge (private) training databases in lots of languages. But you have to pay per 15 seconds of audio. That is the reason I personally prefer more the local non-cloud solutions...
I'm very curious for the feedback ...