Had a nice discussion last week about voice controlling Node-RED. Based on that discussion I created a prototype of node-red-contrib-deepspeech on Github. It is not released on NPM, since it is just an experiment.
I just wanted to test two things:
- Can I use Deepspeech with Node-RED, i.e. can I convert without cloud service speech to text with decent quality (based on trained deep learning, rather similar to Google's cloud service).
- And if that works, how fast or slow is it.
I'm not going to repeat all information, since I have described it on my readme page above. The summary:
Quality seems to be good, but converting an audio sample of only 1.975 seconds takes 50.17 seconds on a Raspberry Pi 3 Model B. However all calculations are being done on CPU, while a neural network should be executed at least on a GPU...
Don't know if it is useful to continue with this node? All 'constructive' feedback is again welcome.
Would be nice if users could do some testing on other hardware (corresponding to Deepspeech hardware recommendations) to test whether we can achieve real-time STT without needing a complete datacenter