Voice Assistant

I'm currently digging into the topic of a self-hosted Voice Assistant to complement our existing Node-Red Smart Home automation.
The more I read about the topic, the less I really know where to start.

I have an existing Jabra 510 conference speaker that is pretty much unused and a spare RPi 3b and/or a Tinkerboard 2S lying around that I'd like to put to use.

Frameworks I took into consideration so far:

  1. Rhasspy
  • Seems pretty feature rich but also very convoluted
  • Loose collection of different software packages
  • Actively developed
  • Node Red integration
  1. SEPIA
  1. Project Alice
  • Seems straight forward to setup
  • Actively developed
  • Node Red integration: To be clarified?
  1. node-red-contrib-voice2json
  • Easy to setup
  • Perfect NR integration
  • No longer actively developed?
  1. Mycroft
  • Seems quite straight forward to setup
  • Limited to RPi's? To be clarified / tested
  • Actively developed
  • Node Red integration seems lacking

I'd be happy to hear further recommendations or remarks to this (draft) assessment.
Anyone running such and has some experience to share?

Ping @sepia-assistant @JGKK (hope you don't mind)

1 Like

Perhaps OpenAI's Whisper makes it all easier, it is easy to use with python and multilingual. I don't know how it performs though.

1 Like

Hi @Sineos ,

about SEPIA :slight_smile: . Obviously I'm biased here ^^, but I don't think it is very complicated to get started. You can simply pull the Docker container for SEPIA-Home (or download the release + Java 11), run the setup and you can already start to experiment with SEPIA. The client is hosted within the same container and can be opened in any browser. The only tricky part is to run the client via localhost or SSL (https) otherwhise the browser will not allow you to use the microphone. In case you have an Android phone or tablet you can use the Play Store app (or the release APK) and ignore any SSL restrictions.
Later you can "upgrade" the setup by installing the STT Server as well to be completely independent of any OS-based STT (Web Speech API, local OS engines etc.). There are DIY clients for smart-speakers/displays etc. too.

Node-RED can be controlled via 2 options. The "official" one is explained here. This will use MQTT to send messages to Node-RED and works by setting up "virtual" smart-home devices via SEPIA's web-based Control-HUB.
The 2nd method is still experimental, because I've started to write a Node-RED plugin for SEPIA a while ago but never really found the time to finish it. It is already pretty advanced though :slight_smile: .

About the other assistants. Rhasspy and voice2json are both created by Michael Hansen who's working for Mycroft now (another great open-source assistant), but I think he's still working on Rhasspy as well. We are exchaning ideas and help each other out from time to time and my goal always was to build more interfaces between Rhasspy and SEPIA (e.g. to use the same TTS etc.). I'm not sure how deeply integrated Node-RED is in Rhasspy and Mycroft, but I'm pretty sure they both support MQTT as well. Project Alice is something I haven't heard of yet.

Whisper is a very interesting speech recognition system and I've even managed to run the smallest model on a Rasperry Pi 4, but it is not a voice assistant and it can only transcribe pre-recorded audio. It is also optimized for 30s audio chunks, so expect accuracy to drop a bit for shorter audio.
I've considered integrating it into the SEPIA STT-Server as well (so far it supports Coqui and Vosk), but decided to wait until the release a new version that supports streaming audio input.

4 Likes

Many thanks for these first replies. Added Mycroft and some further information to the OP list. Alas, this does not make the decision easier :sweat_smile:

Maybe something to add regarding SEPIA. I usually interact with the assistant via my mobile phone (read news in the morning, check weather, check my reminders or lists etc.) and only ocassionally use one DIY client as an Echo replacement to control lights or play radio music. What I want to point out is there are a lot more things you can do with SEPIA if you have a display especially when you create custom voice-commands and add them as buttons to the GUI as well (here are some screenshots of the client).
Due to the focus on a nice and stable GUI I had to make some compromises and even the smallest headless DIY client is still based on the same code meaning it will be a little bit more resource intense and won't run with all features (wake-word for example) on very small devices like a Raspberry Pi Zero (1). I'm planning to change that in the future and write a new tiny-client ... as soon as I have time :sweat_smile: .

I have been working with Rhasspy as my Alexa replacement. It does integrate with node-red via MQTT. So there are no issues there.

Rhasspy also has a hub/satellite model so the computer that does all of the hard work can be beefier with the satellites just passing the data.

My long-term goal is to have Rhasspy (or something similar) preform all of my home automation tasks and for commands that are not recognized pass them off to Alexa for analysis and return info. and then have Rhasspy intercept that response and send it back to the correct Rhasspy satellite. I do have this working in a test environment.

I have not heard of SEPIA so I too will be researching that.

1 Like

I guess, I'll start this journey with SEPIA. Also because of your kind replies @sepia-assistant and because you are obviously an ardent supporter of your product.

Do you think I'll have some success in setting up the entire framework on a Tinkerboard 2S?

1 Like

This is similar for SEPIA where you have SEPIA-Home server running the NLU, TTS, database, samrt-home connections etc. etc., STT-Server can run next to it and then you can put clients all over your home (any desktop or mobile browser, headless Raspberry Pis, mobile apps). The Rhasspy satellites will be less resource hungry though :slight_smile: .

:smiley: let me know how it goes, I'm always interested in feedback to see what parts need further improvements!

2GB or 4GB? I have several systems running on Raspberry Pi4, I think this is probably similar or even more powerful. With 2GB you should be able to run SEPIA-Home (the central server) and the STT-Server at the same time. With 4GB you definitely have enough juice left for the client and Node-RED. Roughly I'd say calculate about 1GB RAM for each component (server, STT, client). More advanced TTS and larger STT models can require a bit more RAM though.

Ok, some progress but still some challenges:

After some woes with alsa / pulseaudio, the Jabra 510 is nicely working for both playback and recording.

Where I'm stuck currently: The DIY installation seems not working correctly. I can connect but it does not seem to respond:

Also, I guess consequently, no reaction to any wake words etc.

Any hint how to debug this would be appreciated.

I installed the home server yesterday. I had two issues. When I chose 1b of the install menu it errored out and aborted because the config.yaml was not present. I figured out what to do after a little searching. The 2nd item was very frustrating. On my machine, the admin user is not admin. It is admin@sepia-localdomain or something like that. Is there a way to make it just admin?

Hopefully I will get to installing a client this weekend.