As of now not. I might implement it at a later point. Voice2json only supports Porcupine right now. There might be mycroft precise support in a future version. Right now I think snowboy with the approach i described above is still the best way for Hotword detection from Nodered. At least until they shut down at the end of the year.
Right now we are a little way from the beta but we will definitely post here when there is something worth trying.
For now the documentation will be in english but when that one is done we can think about a german translation. Hallo aus Berlin
For sending wavs over mqtt just use a file node connected to an mqtt out node set to send the file as a single buffer object.
On the other side the other way around. Mqtt in connected to a file node set do write the buffer from the received msg.payload to a file.
But id really recommend to read and write the files on both sides from a folder mounted to tmpfs to minimize sd card wear.
@JGKK Hi Johannes,
that works perfectly fine, thank you for the help. I never used the file node yet, it simply did not occur to me. Great stuff, this is very promising. Snowboy works pretty well so far, I'm really excited to try out your wrapper nodes.
Maybe I'll try using voice2json "raw" until then. Let's see how easy this is...
Thank you for the inspiration. If I can be of some help, just let me know.
Voice2json is easy to use even without our wrappers.
The beta for 2.0 of voice2json is out now. Mike updated the docs to include the features of 2.0 and all the download links now point to 2.0 packages.
There is a lot of performance improvements and a few new features. So have fun exploring.
You are right @JGKK, it‘s a pretty good learning curve.
Only issue I‘m running into is getting voice2json to run in my production environment, as I have Node-RED running with the official docker image based on alpine.
I have difficulties building a custom image on top of that.
Do you have your setup running in docker?
Cant you run the official docker image for voice2json in parallel? Look at the part of the documentation of voice2json about using it with mqtt http://voice2json.org/recipes.html#create-an-mqtt-transcription-service
Maybe you can use something like this to communicate between the two docker containers.
I have no experience with running nodered in a docker unfortunately.
I thought about that and I think I'll go for that approach for now. I liked the idea of using exec nodes to control the voice2json commands and that is not going to work that way.
Still, it might be the easiest solution for now.
But as soon as your nodes are ready I'll have to find a solution. Probably building a custom ubuntu docker image for node-red will be the way to go...
Thank you for the input!
Just as a follow-up to the point above: Having voice2json outside of my node-red docker was not worth the effort in the end. Having it at my fingertips in the flows is actually part of the appeal as far as I'm concerned. It makes life much easier.
I therefore built a custom docker image not based on alpine but Debian, which makes it easiest to use pre-compiled .deb-files etc.
I'm wondering whether it may help anybody, if so, you can find my first draft here (including the voice2json installation, which is a little rough, maybe I'll split that into an own Dockerfile later, first I want to have a voice assistant up and running ):
voice2json now defaults to using mycroft-precise as it’s wakeword engine. I had good success training my own model with the instructions on there github and a bit of trial and error.
I’d be happy to help anybody here if they have questions about the process.
Right now this works best with the deb package of voice2json.
And @Steve-Mcl i will implement a hotword node for this.
Just thought i’d let you all know now that snowboy is closing down at the end of the year.
I have been using a node-red skill for Mycroft and it works great for me.
I love the development work they are doing and Im a big fan of precise the wakeword component they have been building but right now to have a good experience you have to use it with googles api for stt which is a big no no for me.
I know they are working on having deepspeech be their primary stt engine but right now deepspeech is slow on a pi as it runs single threaded and their just isn’t many good models for languages other than english.
I dont want to take away from the great project it is, i just want to point out that its not a completely offline solution for now.
Hi Johannes ( @JGKK ),
thank you for instructions. I am unsure if I should wait for the solution you are working on or start to get into voce2json myself. Is there a ETA for the wrapper you are working on ?
Grüße aus Niedersachsen
I can’t give an eta right now unfortunately. Voice2json itself is in the Transition to Version 2.0.0 which is in Beta right now. Version 2.0 brings many breaking changes not just to the underlying libraries but also to core functionality. Im in active talks with Mike and testing the Beta release out right now but there is still some things which are broken atm. For example from 2.0.0 onwards the used wakeword system is precise and not Porcupine anymore but this is not working right now in some scenarios involving Docker and some issues involving piping from and writing to sdtin from node.js which unfortunately also until fixed impacts using it with the exec node.
Mike is adding quite a bit of functionality that we will use in our wrapper nodes so im kind of waiting for those to work properly and the fixes to arrive.
All this said definitely start playing with it now and especially report bugs if you find them. Just be aware its in a state of flux right now but alot of work is beeing done towards the final release soon. I would recommend using the beta as the docs already reflect the 2.0 functionality and 1.x profiles wont work in Version 2.
And all the profiles and commands you build now to use with the commandline will work with the wrapper nodes once they are done.
thanks for the quick and detailed answer. if i find the time to try things out, i will follow the advice and use the beta version. (and report bugs). however... i hope you will keep us / me informed about future news in this thread.
Have a look here:
This node i made has some features which are made to work especially well with tools like voice2json and our upcoming wrappers for it. But it should in general be a good tool for anybody who wants to record speech commands on linux and use them in nodered.
I’m happy for any feedback as this is still very much a beta version and work in progress.
I was browsing github and came across "Olivia" Looks really nice, i am not entirely sure how the data is being processed, but it uses a neural network, could potentially be useful ?
Im wondering what they built their stt on or what they use for that component. Can’t find anything about this part on a first glance. All the info is about the neural network they use for nlu but not the stt part.
Edit on further inspection they just do the nlu part themselves so its really more a chatbot and not a complete pipeline for voice processing.
Yeah you are right, the STT part should still be handled somewhere else. I tried to install the deepspeech project from mozilla, which works offline, but what i find along the way is that all these audio to text projects are quite hard to deal with :')
Someone created a tutorial for a pi to measure performance.
Deepspeech is quite slow on a pi right now compared to kaldi as it will only run on a single core. The other problem is that many of the models available for languages other than english are not compatible with a pi as deepspeech works with different models for different tensorflow implementations. There is also the problem that deepspeech is in such a state of development that right now every new version breaks the compatibility with models generated for the previous version
Im sure deepspeech will improve hugely as it progresses forward towards an 1.0 release but right now its just not the best on hardware like a pi.
Depending on your language either kaldi or pocketsphinx are still your best bet.
But all those solutions including deepspeech work best and fastest on hardware like sbcs if you limit their vocabulary and language model to do domain specific transcription. Unfortunately that is where the pain starts as you have to start compiling tools like ken lm and writing your own phonetic dictionaries to create the language models and so on. I ve been down this path and its very frustrating at times.
This is why I recommend using things like voice2json or rhasspy as they do all that for you. They even give you the choice to use any of the above systems in the backend including deepspeech. That’s their greatest feature that they abstract away most of the low level pain associated with open source stt systems as most of them were developed by scientists for scientists and never intended for noobs like us.
Unfortunately their is not the one true great solution out their especially for open ended transcription on such limited hardware like most people run nodered on.
For me right now kaldi strikes the best balance between speed, accuracy, available languages and being finally easy to use with tools like voice2json.
@WhiteLion, @SymanK83. @mudwalker and everybody else it is finally time and you can try the beta of the voice2json nodes @BartButenaers and me have been building
You can find all the information in this post:
or directly in the node-red-contrib-voice2json repository
I hope you like what we have so far.