Hi Johannes, nice to meet you again
Thanks! That looks like a lot of info I should read
That's correct, ASR is Kalid with the Zamia acoustic models for English and German. The language model can be customized though this feature is still not properly documented. Besides that I support the native ASR engines integrated into Android (e.g. Google Cloud/Offline, Samsung, etc.), Chrome/Chromium (Google Cloud), iOS (Apple, currently broken) and Firefox (Google in latest Nightly, Deepspeech, highly experimental).
Wake-word engine is Porcupine, correct.
Default wake-word is "Hey SEPIA" and I support all wake-words they've ever release under open-source license (~40, e.g. raspberry, blueberry, porcupine, grashopper, terminator, etc.). If you managed to obtain a custom wake-word from Picovoice you can use that as well.
Currently I don't officially support engines like Snowboy or Precise for 2 reasons. 1) They are not working cross-plattform and 2) Their setup is not trivial.
That said it is still possible to run whatever wake-word engine you want to use and use SEPIA's remote-action endpoint to trigger your client.
Yes, if you use the SEPIA STT Server (the Kaldi system combines with my own Python server) and TTS via the SEPIA server (currently Espeak, Pico, MaryTTS).
There are around 2-3 dozens of built in "slots" like Date/Time, Location, Smart Home Device Type, Room, Color, Temperature, etc.. Most of it is custom made, based on statistics and regular expressions but the SEPIA NLU is a chain of modules that can be customized. That means you can for example use the Python bridge or web-API module to call your own code from other sources (e.g. if you run your own Rasa server or similar things). Simple "slots" can be defined inside a new service as well (simple=fixed names/regular expressions). I do NOT include large (GB size), pre-trained language models, but I'm currently thinking to implement a light-weight ML NLU module that I wrote a while ago for a different project.
Btw. Timer/Alarm and Weather service are already integrated (same for news, smart home, navigation, Wikipedia, radio and much more).
Offline TTS yes (see above), via MQTT ... not directly. SEPIA usually works in the other direction (send stuff to Node-RED). That said SEPIA has a HTTP endpoint for TTS, but that requires proper, token based authentication. Basically the necessary components to implement TTS requests via MQTT are there but in the context of Node-RED it probably makes more sense to build a node that could actually communicate with the SEPIA "answer" endpoint in a proper way. SEPIA has a input command called "saythis" that can be used to trigger TTS with any output you like.
I haven't had that use-case yet, but in theory all the necessary information is available. Since every client sends its request to the SEPIA server one could implement a procedure that blocks inputs of the same account in quick, consecutive calls.
Thank you