SEPIA Open Assistant - Privacy respecting, self-hosted voice-control

Hello everybody,

I'd like to introduce you to a project of mine called SEPIA Open Assistant.

SEPIA is an open-source, self-hosted digital voice assistant that works on your mobile phone (similar to Google Assistant) and as smart speaker/display in your living-room (similar to Amazon Echo) while respecting your privacy and giving you back full control over your personal data :slight_smile:
The project is hosted on GitHub under the MIT license meaning all source code is freely available and everybody can participate and improve it.

If you want to learn more about the architecture of SEPIA please visit the official documentation.
In short: SEPIA uses a server-client setup with the server running on e.g. a Raspberry Pi while client apps are available for Android, browser, iOS (not yet in Apple Store) and as "headless" version for Raspberry Pi as well. :robot: :iphone: :studio_microphone: :computer:

A few days ago I've release SEPIA Home version 2.5.0 with official support for Node-RED or to be more precise with support for MQTT and a tutorial describing the interaction with Node-RED.
The tutorial can be found in the smart home section of the SEPIA Wiki.

It'd be great to find some people who would like to test the integration and give me some feedback here or in the issues section of the SEPIA docs :slight_smile:. (See documentation link above for installation instructions).

Hope you like it!
Florian

3 Likes

Hello @sepia-assistant and welcome to the node-red forum. I think weā€˜ve met before on the openhab forum :wave:.
A good place to post your news would be this thread:
https://discourse.nodered.org/t/ways-to-control-node-red-with-speech-commands/

A few questions i have :see_no_evil::

  • What do you use as a wake word (precise, porcupine, snowboy?) and stt (kaldi, deepspeech, pocket sphinx?) components because i think thats one of the most interesting things for this community to know? I see its kaldi and porcupine, or are there options? Which languages do you offer?

  • Do you offer custom wakewords?

  • Do you train your own domain specific language model for the framework you use to improve offline performance and accuracy?

  • does it work completely offline?

  • Do you offer built in slots for easy creation of things like timer intents or weather intents and what kind of nlu do you use for intent parsing

  • Do you include a offline tts component that can be triggered via mqtt?

  • do you have a setup for things like hotword coalescing for a setup with multiple satellites and a base throughout a house that overlap?

Welcome to the node-red community and good to see you here, Johannes

1 Like

Hi Johannes, nice to meet you again :wink:

Thanks! That looks like a lot of info I should read :smiley:

That's correct, ASR is Kalid with the Zamia acoustic models for English and German. The language model can be customized though this feature is still not properly documented. Besides that I support the native ASR engines integrated into Android (e.g. Google Cloud/Offline, Samsung, etc.), Chrome/Chromium (Google Cloud), iOS (Apple, currently broken) and Firefox (Google in latest Nightly, Deepspeech, highly experimental).
Wake-word engine is Porcupine, correct.

Default wake-word is "Hey SEPIA" and I support all wake-words they've ever release under open-source license (~40, e.g. raspberry, blueberry, porcupine, grashopper, terminator, etc.). If you managed to obtain a custom wake-word from Picovoice you can use that as well.
Currently I don't officially support engines like Snowboy or Precise for 2 reasons. 1) They are not working cross-plattform and 2) Their setup is not trivial.
That said it is still possible to run whatever wake-word engine you want to use and use SEPIA's remote-action endpoint to trigger your client.

Yes, if you use the SEPIA STT Server (the Kaldi system combines with my own Python server) and TTS via the SEPIA server (currently Espeak, Pico, MaryTTS).

There are around 2-3 dozens of built in "slots" like Date/Time, Location, Smart Home Device Type, Room, Color, Temperature, etc.. Most of it is custom made, based on statistics and regular expressions but the SEPIA NLU is a chain of modules that can be customized. That means you can for example use the Python bridge or web-API module to call your own code from other sources (e.g. if you run your own Rasa server or similar things). Simple "slots" can be defined inside a new service as well (simple=fixed names/regular expressions). I do NOT include large (GB size), pre-trained language models, but I'm currently thinking to implement a light-weight ML NLU module that I wrote a while ago for a different project.
Btw. Timer/Alarm and Weather service are already integrated :slight_smile: (same for news, smart home, navigation, Wikipedia, radio and much more).

Offline TTS yes (see above), via MQTT ... not directly. SEPIA usually works in the other direction (send stuff to Node-RED). That said SEPIA has a HTTP endpoint for TTS, but that requires proper, token based authentication. Basically the necessary components to implement TTS requests via MQTT are there but in the context of Node-RED it probably makes more sense to build a node that could actually communicate with the SEPIA "answer" endpoint in a proper way. SEPIA has a input command called "saythis" that can be used to trigger TTS with any output you like.

I haven't had that use-case yet, but in theory all the necessary information is available. Since every client sends its request to the SEPIA server one could implement a procedure that blocks inputs of the same account in quick, consecutive calls.

Thank you :slight_smile:

2 Likes

Thanks for the answer.

I think porcupine removed the compatibility for some models including custom trained ones to run on the raspberry pi in the latest version and i donā€™t quite like there terms of service :weary:

A proper suite of sepia nodes would be a great addition to the community, so if you feel motivated it would be awesome if you would make them :wink: I think you will find many people in the community who would be willing to help you get started.

Yes unfortunately since the dawn of time (~2018) they've ever only supported free custom wake-words for x86_64 systems (SEPIA requires the WASM version btw :face_with_head_bandage:) as kind of a teaser for their embedded systems. They generously created "Hey SEPIA" to support the project end of 2018 but it was a one-time "deal" and I won't receive updates for the new engine versions :frowning: . They are getting more and more commercial, its sad but I understand they have to make money somehow. People probably ask them all day about free wake-words for "Computer" or "Jarvis" :grin:. Currently they have a unique position regarding the number of supported platforms, ease of use and precision but I'm sure new options will emerge. Actually I would even be willing to pay a few bucks for custom wake-words, but its probably hard to protect against illegal copies.

It looks like a very interesting task and its probably not too hard. Lets see :smiley:

2 Likes

Creating Nodes : Node-RED is a great point to get started

Thatā€™s actually why i quite like precise. Although not as lightweight and cross platform its completely open source and you can train models completely offline even on modest hardware. I had quite good success training models that work better than the snowboy ones myself.

1 Like

Mycroft is doing some great stuff and I have had a close eye on Precise for some time now though I admit I never trained my own wake-word with it.

Actually it should be pretty easy to feed the Precise results into the SEPIA DIY Client if they run on the same machine. The DIY Client runs on a WebSocket server called CLEXI that is used for the so called "remote terminal" (an interface to communicate with a headless SEPIA client) and that already has a command to trigger the client. The only thing that might break it is the microphone management. I'm not sure if the Pi can handle 2 parallel audio streams of the same mic properly. If not one probably needs to handle some events for ON/OFF which increases complexity drastically :grimacing:
... I'm just thinking out loud right now :sweat_smile:

I find the documentation quite complex.

What is the reason for installing elasticsearch?
I see a dockerfile, can the ā€œserverā€ be installed with docker? Are there any specific instructions?

Well you could use pulse audio for that but the user would have to set it up themselves which is non trivial unfortunately.
Alternatively does sepia accept an audio stream of raw buffers over mqtt or a websocket as an audio-source like snips did or rhasspy does it now?

Are you referring to the Node-RED part or the docs in general? It can be a bit confusing at the beginning I admit since the project has become quite large over time and the docs are not very "polished" but I promise the installation is actually super easy, almost as easy as download->extract->run :wink:

Elasticsearch is the one and only database SEPIA uses. Due to its advanced text search features its like a natural fit and it is fundamentally based on JSON objects which is also true for SEPIA.
[EDIT]
Maybe I should add that SEPIA stores a lot of things, starting from user accounts, to personal commands, to custom service, to smart home interfaces etc.

Yes. Here is the Docker repo with instructions: Docker
If you already use Docker its probably the easiest way to get started BUT if you already have Java 8 or 11 on your System its maybe even more complicated than the actual installation :grin:

I have a little private war with Pulseaudio, it tends to break my audio setups on a regular basis :stuck_out_tongue_closed_eyes:

The communication between SEPIA clients and the STT server work this way but only in one direction client -> server.
Whats your idea here? Decoupling of microphone and client? :thinking:
For the "Custom" ASR engines (the Kaldi server) this could be interesting unfortunately it will not work with the native engines (which is maybe ok if you decide you don't want to use them anyways).

Are you referring to the Node-RED part or the docs in general?

Docs in general

but I promise the installation is actually super easy, almost as easy as download->extract->run

I don't see this in the documentation though. I would like to try it, but have honestly no idea where to start. I have a raspberry pi but that is not available for this project. Then I have a debian server which needs to keep running, how much is this project going to mess with my system, ngrok, elasticsearch...i am not too keen to install it other than docker, but if that makes it harder I will pass.

Look for the sections that say "quick start" :grinning:
There is also a small set of blog articles. This one is a good place to start I guess:

I think I gave a wrong impression of Docker ^^. It is quite easy but it requires 3 steps: Define a shared folder, start the container for setup, restart the container for regular work. The setup step cannot be skipped because you need to create the administrator account and define a password before you can use SEPIA.

Actually there is only 1 basic requirement for SEPIA: Java. And even this can be downloaded directly into SEPIAs base folder. Elasticsearch runs on Java as well so overall this setup is very non-invasive for your system. Close SEPIA server, delete folder, done :wink:
The only non-trivial thing is that the SEPIA Client is a web-application and thus requires a secure browser-context for microphone access, meaning either you work via 'localhost' or you need SSL certificates. Thats why I'm mentioning Ngrok as a quick workaround for HTTPS.

Look for the sections that say "quick start"

That is often an issue for me: too much text. When I click the link in the opening post, I end up on this. There is a somewhat concise quick-start section, but it takes me to this where there is a long story where first the ntp settings are updated to sync with some random ubuntu ntp server (?) and complex written sentences to extract a zip.

download->extract->run

In the end, this was it.

ie;
On linux;
install JAVA8

cd ~
mkdir SEPIA
wget https://github.com/SEPIA-Framework/sepia-installation-and-setup/releases/download/v2.5.0/SEPIA-Home.zip

unzip SEPIA-Home.zip -d SEPIA/

cd SEPIA
./setup.sh

choose 
option 4
option 1
option 0

./run-sepia.sh

But I quickly found this all goes way over my head, so everything removed again :wink:

I agree this part needs some rework! Redirects and optional steps can be an unnecessary distraction and some things need to be updated, they sound more complicated than they actually are.

You were almost there :cold_sweat: Did you check the blog article I've linked in the previous post?

Just a quick update:

I've been working on some Node.js and Node-RED integrations for SEPIA. The idea is too have the server connection as configuration node and then create a user node (with authentication) that can be used to make calls to several of the SEPIA server endpoints like the NLU module (returns a JSON object with intent and parameters aka slots etc.) or TTS (returns a link with a sound file).

Its still early stage but the nodes you see on the screenshot are working already. Next step will be the integration of the remote terminal for the client (CLEXI server) that can be used to read client events like STT results, animation states, service events (e.g. alarm clock/timer trigger) etc. and finally the chat server socket connection. After that I'll move all the code to the master branch and write a little tutorial to get started :slight_smile: .

Hope you like it, more updates coming soon :sunglasses: (probably after the SEPIA v2.5.1 release).

1 Like

Just in case someone wants to try it I found out the installation steps :wink:

cd ~
git clone -b dev https://github.com/SEPIA-Framework/sepia-node-js-client.git
cd ~/.node-red
npm install ~/sepia-node-js-client
1 Like

This topic was automatically closed after 60 days. New replies are no longer allowed.