Node-red-contrib-voice2json

Hi, i tried to install this node on my rpi 3b with the latest node-red and the latest rpi os.

I installed

  1. npm install johanneskropf/node-red-contrib-voice2json
  2. sudo apt install ./voice2json_2.0_armhf.deb
  3. tar -zxvf en-us_kaldi-zamia-2.0.tar.gz
  4. drivers for my seeed 2mic card

I dont see the nodes and
i dont see the demo flow via the Node-RED import

sudo npm audit fix
sudo npm audit fix --force
and rebooted th rpi...
it works , dont no why...

I suspect it was the reboot, not the audit fix. Audit fix can break things as it forces module updates that may not be desirable, I think. If all is well though, don't worry about it.

1 Like

If you have any questions or further problems feel free to hit me up as im happy to troubleshoot and help.
Should you find any bugs concerning the nodes please open an issue on github so that I can keep track of it as the nodes are still in beta.

Johannes

Hi @JGKK ,

I have a node-red-contrib flow a little bit working with my seed 2mic audio.
I installed the samples and tried it with the preferred english /home/pi/en-us_kaldi-zamia-2.0. that works OK!

After that i made my own config with
/home/pi/nl_kaldi-cgn-1.1

The default dutch sentences do work, but i still have to use the english hot word (hey mycraft) !

How can i change the english hotword in a dutch word. I see that there are different techniques for making a hotword. Which technique do i have to use with my dutch config ?

do i have to use the kaldi hotword technique (because i have a kaldi profile /acoustic model type) or can i use other techniques like

[
    {
        "id": "753d8ebd.d4e2a",
        "type": "tab",
        "label": "Flow 2",
        "disabled": false,
        "info": ""
    },
    {
        "id": "4406bef1.4010f8",
        "type": "voice2json-stt",
        "z": "753d8ebd.d4e2a",
        "name": "stt",
        "voice2JsonConfig": "82191045.5e541",
        "inputField": "payload",
        "controlField": "control",
        "outputField": "payload",
        "autoStart": true,
        "x": 910,
        "y": 420,
        "wires": [
            [
                "803d64ac.6bbee8",
                "3377e4ea.0b3bec"
            ]
        ]
    },
    {
        "id": "803d64ac.6bbee8",
        "type": "debug",
        "z": "753d8ebd.d4e2a",
        "name": "Show stt transcription text",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "payload",
        "targetType": "msg",
        "statusVal": "",
        "statusType": "auto",
        "x": 1190,
        "y": 300,
        "wires": []
    },
    {
        "id": "77d09bd5.298a44",
        "type": "voice2json-training",
        "z": "753d8ebd.d4e2a",
        "name": "nlKaldi",
        "voice2JsonConfig": "82191045.5e541",
        "inputField": "payload",
        "outputField": "payload",
        "loadedProfile": "",
        "x": 350,
        "y": 180,
        "wires": [
            [
                "d6b9d7df.44f5e8"
            ]
        ]
    },
    {
        "id": "6942354d.5578ac",
        "type": "inject",
        "z": "753d8ebd.d4e2a",
        "name": "Start training",
        "props": [
            {
                "p": "payload"
            },
            {
                "p": "topic",
                "vt": "str"
            }
        ],
        "repeat": "",
        "crontab": "",
        "once": false,
        "onceDelay": 0.1,
        "topic": "",
        "payload": "train",
        "payloadType": "str",
        "x": 170,
        "y": 180,
        "wires": [
            [
                "77d09bd5.298a44"
            ]
        ]
    },
    {
        "id": "d6b9d7df.44f5e8",
        "type": "debug",
        "z": "753d8ebd.d4e2a",
        "name": "Training result",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "true",
        "targetType": "full",
        "statusVal": "",
        "statusType": "auto",
        "x": 660,
        "y": 180,
        "wires": []
    },
    {
        "id": "3377e4ea.0b3bec",
        "type": "voice2json-tti",
        "z": "753d8ebd.d4e2a",
        "name": "tti",
        "voice2JsonConfig": "82191045.5e541",
        "inputField": "payload.text",
        "controlField": "control",
        "outputField": "payload",
        "autoStart": false,
        "x": 1130,
        "y": 420,
        "wires": [
            [
                "33639506.781d1a"
            ]
        ]
    },
    {
        "id": "33639506.781d1a",
        "type": "debug",
        "z": "753d8ebd.d4e2a",
        "name": "show intent recognition result",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "payload",
        "targetType": "msg",
        "statusVal": "",
        "statusType": "auto",
        "x": 1520,
        "y": 300,
        "wires": []
    },
    {
        "id": "732f01fc.c743a8",
        "type": "comment",
        "z": "753d8ebd.d4e2a",
        "name": "Readme for this voice2json example flow",
        "info": "This example flow shows the use of the voice2json train, stt and tti node.\nTo try everything please make sure to first download the english kaldi profile to your home folder from:\nhttps://github.com/synesthesiam/en-us_kaldi-zamia/archive/v2.0.tar.gz\nIf you should save the profile folder to a different location please edit the profile path in the config accordingly.\nBefore trying the stt and tti node you have to train the profile.\nFor detailed instructions please read the complete documentation at: https://github.com/johanneskropf/node-red-contrib-voice2json/blob/master/README.md & http://voice2json.org/",
        "x": 260,
        "y": 100,
        "wires": []
    },
    {
        "id": "a9ecce69.dc2d58",
        "type": "sox-record",
        "z": "753d8ebd.d4e2a",
        "name": "",
        "buttonStart": "msg",
        "inputs": 1,
        "inputSource": "1,0",
        "manualSource": "",
        "inputChannels": "",
        "inputRate": "",
        "inputBits": "",
        "byteOrder": "-L",
        "encoding": "signed-integer",
        "channels": 1,
        "rate": 16000,
        "bits": 16,
        "gain": "0",
        "lowpass": 8000,
        "showDuration": false,
        "durationType": "forever",
        "durationLength": 0,
        "silenceDetection": "nothing",
        "silenceDuration": "2.0",
        "silenceThreshold": "2.0",
        "outputFormat": "stream",
        "manualPath": "color",
        "debugOutput": false,
        "x": 170,
        "y": 380,
        "wires": [
            [
                "f66171fb.010238"
            ],
            []
        ]
    },
    {
        "id": "f66171fb.010238",
        "type": "voice2json-wait-wake",
        "z": "753d8ebd.d4e2a",
        "name": "",
        "voice2JsonConfig": "82191045.5e541",
        "inputField": "payload",
        "controlField": "control",
        "outputField": "payload",
        "nonContinousListen": true,
        "x": 380,
        "y": 380,
        "wires": [
            [
                "115731c4.96b7a6",
                "e0d849c1.b7df"
            ],
            [
                "162067ed.6f5898",
                "e0d849c1.b7df"
            ]
        ]
    },
    {
        "id": "a4c2b1f.02510d",
        "type": "debug",
        "z": "753d8ebd.d4e2a",
        "name": "",
        "active": false,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "payload",
        "targetType": "msg",
        "statusVal": "",
        "statusType": "auto",
        "x": 870,
        "y": 300,
        "wires": []
    },
    {
        "id": "115731c4.96b7a6",
        "type": "trigger",
        "z": "753d8ebd.d4e2a",
        "name": "3s than listen",
        "op1": "",
        "op2": "listen",
        "op1type": "nul",
        "op2type": "str",
        "duration": "3",
        "extend": false,
        "overrideDelay": false,
        "units": "s",
        "reset": "",
        "bytopic": "all",
        "topic": "topic",
        "outputs": 1,
        "x": 390,
        "y": 280,
        "wires": [
            [
                "f66171fb.010238"
            ]
        ]
    },
    {
        "id": "146c7ce5.b957b3",
        "type": "comment",
        "z": "753d8ebd.d4e2a",
        "name": "wait-wake example hey mycraft what time is it",
        "info": "Prerequisites for this example flow are that you must have a [node-red-sox-utils](https://github.com/johanneskropf/node-red-contrib-sox-utils) installed and a microphone connected to your raspberry or other device. Choose your input device in the mic nodes config and click the button to start recording. After a brief start up period the wait-wake node can be triggered by speaking the standard wake word of *hey mycroft* if no custom wake word has been configured in the selected profiles profile.yml. The wait-wake node will than forward the audio from the mic for three seconds on its second output and ignore wake words until told to listen again. You can restart (stop) the node by injecting start to the control topic.",
        "x": 230,
        "y": 460,
        "wires": []
    },
    {
        "id": "9c9fc89d.e5a858",
        "type": "comment",
        "z": "753d8ebd.d4e2a",
        "name": "stop listening",
        "info": "Prerequisites for this example flow are that you must have a [node-red-sox-utils](https://github.com/johanneskropf/node-red-contrib-sox-utils) installed and a microphone connected to your raspberry or other device. Choose your input device in the mic nodes config and click the button to start recording. After a brief start up period the wait-wake node can be triggered by speaking the standard wake word of *hey mycroft* if no custom wake word has been configured in the selected profiles profile.yml. The wait-wake node will than forward the audio from the mic for three seconds on its second output and ignore wake words until told to listen again. You can restart (stop) the node by injecting start to the control topic.",
        "x": 390,
        "y": 540,
        "wires": []
    },
    {
        "id": "162067ed.6f5898",
        "type": "voice2json-record-command",
        "z": "753d8ebd.d4e2a",
        "name": "record",
        "voice2JsonConfig": "82191045.5e541",
        "inputField": "payload",
        "outputField": "payload",
        "x": 590,
        "y": 420,
        "wires": [
            [
                "4406bef1.4010f8",
                "a4c2b1f.02510d"
            ]
        ]
    },
    {
        "id": "f02fa400.3daca8",
        "type": "inject",
        "z": "753d8ebd.d4e2a",
        "name": "stop",
        "props": [
            {
                "p": "control",
                "v": "stop",
                "vt": "str"
            },
            {
                "p": "topic",
                "vt": "str"
            },
            {
                "p": "payload"
            }
        ],
        "repeat": "",
        "crontab": "",
        "once": false,
        "onceDelay": 0.1,
        "topic": "",
        "payload": "stop",
        "payloadType": "str",
        "x": 150,
        "y": 540,
        "wires": [
            [
                "6b5ba463.053334"
            ]
        ]
    },
    {
        "id": "8fec2a16.63125",
        "type": "inject",
        "z": "753d8ebd.d4e2a",
        "name": "start",
        "props": [
            {
                "p": "control",
                "v": "start",
                "vt": "str"
            },
            {
                "p": "topic",
                "vt": "str"
            },
            {
                "p": "payload"
            }
        ],
        "repeat": "",
        "crontab": "",
        "once": false,
        "onceDelay": 0.1,
        "topic": "",
        "payload": "start",
        "payloadType": "str",
        "x": 150,
        "y": 580,
        "wires": [
            [
                "9a25dec3.ca75b"
            ]
        ]
    },
    {
        "id": "90c3936d.42fa1",
        "type": "comment",
        "z": "753d8ebd.d4e2a",
        "name": "start listening",
        "info": "Prerequisites for this example flow are that you must have a [node-red-sox-utils](https://github.com/johanneskropf/node-red-contrib-sox-utils) installed and a microphone connected to your raspberry or other device. Choose your input device in the mic nodes config and click the button to start recording. After a brief start up period the wait-wake node can be triggered by speaking the standard wake word of *hey mycroft* if no custom wake word has been configured in the selected profiles profile.yml. The wait-wake node will than forward the audio from the mic for three seconds on its second output and ignore wake words until told to listen again. You can restart (stop) the node by injecting start to the control topic.",
        "x": 390,
        "y": 580,
        "wires": []
    },
    {
        "id": "9a25dec3.ca75b",
        "type": "link out",
        "z": "753d8ebd.d4e2a",
        "name": "",
        "links": [
            "12d6de01.b06ad2"
        ],
        "x": 275,
        "y": 580,
        "wires": []
    },
    {
        "id": "6b5ba463.053334",
        "type": "link out",
        "z": "753d8ebd.d4e2a",
        "name": "",
        "links": [
            "12d6de01.b06ad2"
        ],
        "x": 275,
        "y": 540,
        "wires": []
    },
    {
        "id": "12d6de01.b06ad2",
        "type": "link in",
        "z": "753d8ebd.d4e2a",
        "name": "",
        "links": [
            "6b5ba463.053334",
            "9a25dec3.ca75b"
        ],
        "x": 375,
        "y": 480,
        "wires": [
            [
                "4406bef1.4010f8",
                "3377e4ea.0b3bec",
                "162067ed.6f5898",
                "f66171fb.010238",
                "a9ecce69.dc2d58"
            ]
        ]
    },
    {
        "id": "e0d849c1.b7df",
        "type": "debug",
        "z": "753d8ebd.d4e2a",
        "name": "",
        "active": false,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "payload",
        "targetType": "msg",
        "statusVal": "",
        "statusType": "auto",
        "x": 650,
        "y": 300,
        "wires": []
    },
    {
        "id": "82191045.5e541",
        "type": "voice2json-config",
        "profilePath": "/home/pi/nl_kaldi-cgn-1.1",
        "name": "nlkaldi",
        "sentences": "[ChangeLightState]\nzet de (woonkamerverlichting | woonkamerlamp | garageverlichting){name} (aan | uit){state}\n\n[GetTime]\nhoe laat is het\n\n[GetTemperature]\nwat is de temperatuur\nhoe (warm | koud) is het\n\n[GetGarageState]\nis de garagepoort (open | gesloten)",
        "slots": [
            {
                "fileName": "rhasspy/number",
                "managedBy": "external",
                "fileContent": null,
                "executable": true
            }
        ],
        "removeSlots": false,
        "profile": "---\nname: \"nl_kaldi-cgn\"\nversion: \"1.1\"\n\nlanguage:\n  name: \"dutch\"\n  code: \"nl\"\n\ntext-to-speech:\n  espeak:\n    voice: \"nl\"\n\nspeech-to-text:\n  acoustic-model-type: \"kaldi\"\n  kaldi:\n    model-type: \"nnet3\"\n\ntraining:\n  acoustic-model-type: \"kaldi\"\n  kaldi:\n    model-type: \"nnet3\"\n  large-files:\n    - !env \"${profile_dir}/base_dictionary.txt\"\n    - !env \"${profile_dir}/base_language_model.txt\"\n    - !env \"${profile_dir}/base_language_model.fst\"\n    - !env \"${profile_dir}/g2p.fst\"\n    - !env \"${profile_dir}/acoustic_model/model/final.mdl\"\n    - !env \"${profile_dir}/acoustic_model/model/graph/HCLG.fst\"\n"
    }
]

Hello,
As you can read in the voice2json documentation it uses the precise engine for wake word detection. This is independent from which profile you are using.
If you want to train your own wakeword have a look at:

But I must warn you that that process is a little bit involved but quite doable.
You will need:

  • source install precise
  • voice samples from the people that will be using the wake-word saying the wake word both with and without background noise
    • the more the better
    • I recommend at least 50
  • a lot of pieces of random audio that do not include the wake-word to train against
    • those shouldn’t be longer than 5-10 minutes a piece
    • 5 hours or more of this
      • for example split the audio of some of those coffee shop noises videos on YouTube)
      • record some from your day to day life in your home
  • follow the tutorial I linked above. Training can take a few hours

I hope this gives you some pointers to start.
You can also look at the communal models from precise here and try those.

Johannes

Hi @JGKK ,
ok, i followed the manual, but this manual is not dummy proof...

  1. storing wake works 12 ones does work
    precise-collect
    i stored them in the folders as described. but the example shows "/" as a part of eacht foldername, i did not defined that because winscp does not accept this...

  2. training does not work
    (.venv) pi@raspberrypi:~/mycroft-precise $ precise-train -e 60 hey-bram.net hey-bram/

Using TensorFlow backend.
WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.COMPILER_VERSION is deprecated. Please use tf.version.COMPILER_VERSION instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.CXX11_ABI_FLAG is deprecated. Please use tf.sysconfig.CXX11_ABI_FLAG instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.ConditionalAccumulator is deprecated. Please use tf.compat.v1.ConditionalAccumulator instead.

Loading from hey-bram.net...
Traceback (most recent call last):
  File "/home/pi/mycroft-precise/.venv/bin/precise-train", line 33, in <module>
    sys.exit(load_entry_point('mycroft-precise', 'console_scripts', 'precise-train')())
  File "/home/pi/mycroft-precise/precise/scripts/base_script.py", line 43, in run_main
    script = cls(args)
  File "/home/pi/mycroft-precise/precise/scripts/train.py", line 87, in __init__
    self.model = create_model(args.model, params)
  File "/home/pi/mycroft-precise/precise/model.py", line 70, in create_model
    model = load_precise_model(model_name)
  File "/home/pi/mycroft-precise/precise/model.py", line 54, in load_precise_model
    return load_keras().models.load_model(model_name)
  File "/home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/keras/models.py", line 242, in load_model
    model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'
(.venv) pi@raspberrypi:~/mycroft-precise $

  1. second thing that did not work was

(.venv) pi@raspberrypi:~/mycroft-precise $
precise-listen hey-bram.net (no xxx appear !!! etc.)

and i also tried
precise-listen hey-bram.net -d hey-bram/not-wake-word

it does not record wav files in the not-wake-word folder...

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.COMPILER_VERSION is deprecated. Please use tf.version.COMPILER_VERSION instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.CXX11_ABI_FLAG is deprecated. Please use tf.sysconfig.CXX11_ABI_FLAG instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.ConditionalAccumulator is deprecated. Please use tf.compat.v1.ConditionalAccumulator instead.

Using TensorFlow backend.
Traceback (most recent call last):
  File "/home/pi/mycroft-precise/.venv/bin/precise-listen", line 33, in <module>
    sys.exit(load_entry_point('mycroft-precise', 'console_scripts', 'precise-listen')())
  File "/home/pi/mycroft-precise/precise/scripts/base_script.py", line 43, in run_main
    script = cls(args)
  File "/home/pi/mycroft-precise/precise/scripts/listen.py", line 58, in __init__
    self.listener = Listener(args.model, args.chunk_size)
  File "/home/pi/mycroft-precise/precise/network_runner.py", line 107, in __init__
    self.runner = runner_cls(model_name)
  File "/home/pi/mycroft-precise/precise/network_runner.py", line 85, in __init__
    self.model = load_precise_model(model_name)
  File "/home/pi/mycroft-precise/precise/model.py", line 54, in load_precise_model
    return load_keras().models.load_model(model_name)
  File "/home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/keras/models.py", line 242, in load_model
    model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

  1. another thing not described was the way to install and execute the zip file
    i used:
    sudo apt-get install p7zip-full for installing and
    for extracting 7za x pdsounds_march2009.7z

  2. for converting the sound you have to write a scrip and execute it..
    i did not know how, so i made a convert.sh file with

SOURCE_DIR=data/random/mp3
DEST_DIR=data/random

for i in $SOURCE_DIR/.mp3; do echo "Converting i..."; fn={i##/}; ffmpeg -i "$i" -acodec pcm_s16le -ar 16000 -ac 1 -f wav "DEST_DIR/{fn%.*}.wav"; done

and tried to execute it with sudo ./convert.sh
but that also did not work

@update)
I am a little step further..
the training 1) now works by using
(.venv) pi@raspberrypi:~/mycroft-precise $ precise-train -e 60 hey-bram.net hey-bram (and not a / after it as the manual says)

but commands described in 2) do not work...
looks if this software does not work on my pi with the latest rpi os...

These nodes work wunderfull !!!
I have speech recognition working, my devices going on and off and i get a confirmation via speech that the speechtask is executing...
Last and difficult part is my wakeword. Still problems get this to work...

1 Like

Oh nice! please share your flow when you have it working!

i will, but i cant get the last part working...

i think these work
precise-collect and
precise-train -e 60 hey-bram.net hey-bram/

but these dont...
precise-listen hey-bram.net -d hey-bram/not-wake-word
another precise command also does not work: precise-listen hey-bram.net

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.COMPILER_VERSION is deprecated. Please use tf.version.COMPILER_VERSION instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.CXX11_ABI_FLAG is deprecated. Please use tf.sysconfig.CXX11_ABI_FLAG instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.ConditionalAccumulator is deprecated. Please use tf.compat.v1.ConditionalAccumulator instead.

Using TensorFlow backend.
Traceback (most recent call last):
  File "/home/pi/mycroft-precise/.venv/bin/precise-listen", line 33, in <module>
    sys.exit(load_entry_point('mycroft-precise', 'console_scripts', 'precise-listen')())
  File "/home/pi/mycroft-precise/precise/scripts/base_script.py", line 43, in run_main
    script = cls(args)
  File "/home/pi/mycroft-precise/precise/scripts/listen.py", line 58, in __init__
    self.listener = Listener(args.model, args.chunk_size)
  File "/home/pi/mycroft-precise/precise/network_runner.py", line 107, in __init__
    self.runner = runner_cls(model_name)
  File "/home/pi/mycroft-precise/precise/network_runner.py", line 85, in __init__
    self.model = load_precise_model(model_name)
  File "/home/pi/mycroft-precise/precise/model.py", line 54, in load_precise_model
    return load_keras().models.load_model(model_name)
  File "/home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/keras/models.py", line 242, in load_model
    model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

Hello,
Sorry for the late answer but I took a little Christmas brake.
I’ll write something about the training process tomorrow.
I don’t use precise-listen or collect.
For recording samples I actually use a bash script that I will post too tomorrow.
Just two quick notes today:

  • using just precise-train will give you a starting point but will not give you any great models
  • you will have to use precise-train-incremental after that with something like 50 epochs per increment of 10
  • this is where it becomes very important that you have lots and lots of short pieces of random not wake word audio (really like 10-20 hours)
  • this will take a long time on something like a pi 4, like several hours/half a day depending on how much input data there is
  • I use precise-test to test finished models or just test them with voice2json

isnt there a ready pb file with something like the current hey mycraft ready solution?

It here is a pb file for hey google or hey computer thats good enough for me...
Hey mycraft works great but my kids cant remember it..

I had it on my back log for a while now to train a new robust model for hey pips which is what my girlfriend and me are using.
If you would like to contribute some samples I could include your data in my dataset and try train a model and share it with you.
I would need:

  • 10 samples of each person
  • some five minute recordings of typical household sounds that are to expect at the points where you will have your microphone

To record the samples you can use this bash script if you have sox installed:

#!/bin/bash

declare -i n=$1
declare -i c=1

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

bold=$(tput bold)
normal=$(tput sgr0)

((p=p+1))

printf "\nThis script can record wake word samples for training of a wake word model.
       \nThe recording starts when you press return and ends automatically each time.
       \nOnce you have started a recording by pressing return say the wake word naturally
       like you would in daily use.\nLets start:"

for i in {0..9}
do
    printf "${normal}\n\nPress Enter to record number ${c} of 10 wake word recordings.\n"
    read -s -n 1 key
    if [[ $key = "" ]]; then
        sox -t alsa default -r 16000 -c 1 -b 16 -e signed-integer -L ${DIR}/hotword.${n}.wav \
        trim 0 4 vad -p 0.2 reverse vad -p 0.6 reverse
    fi
    ((n=n+1))
    ((c=c+1))
    printf "\nRecorded successfully.\n\n"
done

printf "\n\nFinished, thank you.\n"

Just save this as for example record.sh into the folder you want to record the wake words in.
Than use it like this:

bash record.sh 1

Where the first argument will be the number to start the enumeration of the 10 files that will be recorded on this run. So use a 1 On the first run and than a 11 on the next than 21 And so on or you will overwrite your previous files.
It’s important to record those samples in a quiet environment.

For recording the 5 minute pieces of random audio you can use the sox record node set to record directly to a wav file and set to stop after 300 seconds.
Make sure the recorded random audio does not contain the wake word.

Tell me if you would like to do that.

Johannes

Edit & Ps: if you should want to try training yourself here is what I do:

  • I split all the random noise audio into one minute chunks by running this command in the random folder:

for f in *.wav; do sox "$f" "split.$f" trim 0 60 : newfile : restart ; done
  • Than delete all the original long files and only keep the new noise-split files.
  • I duplicate each wake word file with added random noise from those files with this script:
#!/bin/bash

NOISEDIR=$1

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

for f in *.wav
do 
    NOISEFILE=$(find ${NOISEDIR} -type f | shuf -n 1)
    
    sox -m $f ${NOISEFILE} noise.$f trim 0 `soxi -D $f`
done
  • the Script needs to be run in the folder with the wake words and has one argument which is the path to the noise files (delete the script from the folder when your done)
  • copy about 10-20% of the wake-word files to test/wake-word
  • copy the whole wake-word folder to the precise folder (always work with a copy so that you can start over at any point)
  • do a baseline training with:
precise-train your-wake-word.net wake-word-folder/ -e 100 -s 0.5
  • this will give you a start that will listen to pretty much anything
  • the real training now happens with (this Part can take a while):
precise-train-incremental your-wake-word.net wake-word-folder/ -r path/to/noise-folder -e 50 -th 0.4 -s 0.5
  • once finished you can optionally copy the generated test not-wake-words to to the generated not-wake-words and retrain with the first command:
cp wake-word-folder/test/not-wake-word/generated/* wake-word-folder/not-wake-word/generated
  • and again:
precise-train your-wake-word.net wake-word-folder/ -e 100 -s 0.5
  • now convert to pb:
precise-convert your-wake-word.net
  • copy your-wake-word.pb, your-wake-word.pbparams and your-wake-word.pbtxt to a new folder:
mkdir your-wake-word
cp your-wake-word.pb* your-wake-word/
  • and your done and can now try the result

OK, i have the first 20 pips samples.
and have a 5min file
and have noise chunks
and have combined wav and noise chunks

before i go further..

Some questions

  1. do i have to remove all the sh scripts from wake-word folder
    so that there only remain hotword*.wav and nois.hotwords*.wav in this wake-word folder?

  2. you say: copy about 10-20% of the wake-word files to test/wake-word
    Are these the normal wake-word (hotword*) wav files or noise.wake-word combinations(noise.hotwords*) too?

  3. Does precise-train your-wake-word.net wake-word-folder/ -e 100 -s 0.5
    only use the generated wake-words (no noise file combinations) ?

  4. You said: "copy the whole wake-word folder to the precise folder (always work with a copy so that you can start over at any point)"

    i have /home/pi/mycroft-precise/wake-word. These contain
    hotword*.wav and nois.hotwords*.wav, is that ok?

  5. you said "precise-train-incremental your-wake-word.net wake-word-folder/ -r path/to/noise-folder -e 50 -th 0.4 -s 0.5"

What files are in the noise folder ?
a. the 5 minute file
b. the chunks off the 5 min file
c. the noise.hotword (these are with your description in the wake-word folder

  1. i have 40 wake-word wav files and 40 noise/wake-word files
    and did a precise-train as you described

(.venv) pi@raspberrypi:~/mycroft-precise $ precise-train hey-pips.net /home/pi/wake-word/ -e 100 -s 0.5

i get the following warnings (see output dump below)
a. no tags
b. not enough data to train

(.venv) pi@raspberrypi:~/mycroft-precise $ precise-train hey-pips.net /home/pi/wake-word/ -e 100 -s 0.5
Using TensorFlow backend.
WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.COMPILER_VERSION is deprecated. Please use tf.version.COMPILER_VERSION instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.CXX11_ABI_FLAG is deprecated. Please use tf.sysconfig.CXX11_ABI_FLAG instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.ConditionalAccumulator is deprecated. Please use tf.compat.v1.ConditionalAccumulator instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3138: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/keras/optimizers.py:757: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING: Found 80 wavs but no tags file specified!
Data: <TrainData wake_words=0 not_wake_words=0 test_wake_words=0 test_not_wake_words=0>
Loading wake-word...
Loading not-wake-word...
Loading wake-word...
Loading not-wake-word...
Inputs shape: (0, 29, 13)
Outputs shape: (0, 1)
Test inputs shape: (0, 29, 13)
Test outputs shape: (0, 1)
Not enough data to train

  1. Now wait for my wife to come home....for more wake-word samples...(for you)
    When it is ready , i wil transfer the zip to you
1 Like

I found a part of the problem!

All my recorded files with bash record.sh are 1 kb

When i do a arecord a.wav it is much longer!

(.venv) pi@raspberrypi:~/sndpvk $ arecord -l
perhaps i have to adjust de record.sh with the right settings of my sound card seed2mic ?

**** List of CAPTURE Hardware Devices ****
card 1: seeed2micvoicec [seeed-2mic-voicecard], device 0: bcm2835-i2s-wm8960-hifi wm8960-hifi-0 [bcm2835-i2s-wm8960-hifi wm8960-hifi-0]
  Subdevices: 0/1
  Subdevice #0: subdevice #0


(.venv) pi@raspberrypi:~/sndpvk $ arecord -L
null
    Discard all samples (playback) or generate zero samples (capture)
jack
    JACK Audio Connection Kit
pulse
    PulseAudio Sound Server
default
    Playback/recording through the PulseAudio sound server
playback
capture
dmixed
array
usbstream:CARD=Headphones
    bcm2835 Headphones
    USB Stream Output
sysdefault:CARD=seeed2micvoicec
    seeed-2mic-voicecard, bcm2835-i2s-wm8960-hifi wm8960-hifi-0
    Default Audio Device
dmix:CARD=seeed2micvoicec,DEV=0
    seeed-2mic-voicecard, bcm2835-i2s-wm8960-hifi wm8960-hifi-0
    Direct sample mixing device
dsnoop:CARD=seeed2micvoicec,DEV=0
    seeed-2mic-voicecard, bcm2835-i2s-wm8960-hifi wm8960-hifi-0
    Direct sample snooping device
hw:CARD=seeed2micvoicec,DEV=0
    seeed-2mic-voicecard, bcm2835-i2s-wm8960-hifi wm8960-hifi-0
    Direct hardware device without any conversions
plughw:CARD=seeed2micvoicec,DEV=0
    seeed-2mic-voicecard, bcm2835-i2s-wm8960-hifi wm8960-hifi-0
    Hardware device with all software conversions
usbstream:CARD=seeed2micvoicec
    seeed-2mic-voicecard
    USB Stream Output

When i do:
arecord -f cd -Dhw:1 pvk.wav aplay -Dhw:1 pvk.wav

i do have recording and playback !

Use arecord -l instead of arecord -L than adapt the sox command in the record.sh to something like:

sox -t alsa plughw:1,0 -r 16000 -c 1 -b 16 -e signed-integer -L ${DIR}/hotword.${n}.wav \
        trim 0 4 vad -p 0.2 reverse vad -p 0.6 reverse

where plughw:1,0 is corresponding to the device number of your 2 mic hat.

Are you using a desktop installation of Raspberry Pi OS?
This would also explain why precise-listen isn’t working properly as the desktop version comes with pulseaudio installed and that tends to give problems with pyaudio.

Yes

Some of both

It uses everything in your-wake-word/wake-word and the files in your-wake-word/test/wake-word for validation

I would keep a copy outside of the folder and copy it fresh to the mycroft-precise folder before a new training round. When using incremental training precise runs the model it has at that point against the wav files in the random noise folder. Every time it triggers a false positive this way it takes the chunk it triggers on and saves them to either your-wake-word/not-wake-word/generated or some of them to your-wake-word/test/not-wake-word/generated for validation.
Every ten false positives it will trigger a retraining of the model with this new data it saved. As this data is saved in the wake-word folder you don’t want it there when you start a new session. So once you finished a training session and have converted and saved a pb model to an external folder as described above you have to delete all the files concerning the wake-word from the precise folder before you start again.

The 1 minute chunks of random audio

Hope this helps, Johannes

Hi @Johannes
I am a step further, i think.
Yes i installed an image with the desktop.
I changed my record.sh and added : plughw:1,0. Now my recordings have a filesize!

I created 20 wakewords, 5min file, the chunks (script created 5 pieces) and
after the combine i had 20 extra chunk-wakeword files
The last is strange, i expected every wakeword gets 5 chunk combinations ?

Then i did a test to check if every thing works ok...
so i did a:
(.venv) pi@raspberrypi:~/mycroft-precise $ precisie-train hey-bram.net /home/pi/mycroft-precise/hey-bram/wake-word/ -e 100 -s 0.5

This is the outcome, still no result...
Perhaps, de not-wake-word must be filled for precise-train ?

Using TensorFlow backend.
WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.COMPILER_VERSION is deprecated. Please use tf.version.COMPILER_VERSION instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.CXX11_ABI_FLAG is deprecated. Please use tf.sysconfig.CXX11_ABI_FLAG instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.ConditionalAccumulator is deprecated. Please use tf.compat.v1.ConditionalAccumulator instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3138: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/keras/optimizers.py:757: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING: Found 40 wavs but no tags file specified!
Data: <TrainData wake_words=0 not_wake_words=0 test_wake_words=0 test_not_wake_words=0>
Loading wake-word...
Loading not-wake-word...
Loading wake-word...
Loading not-wake-word...
Inputs shape: (0, 29, 13)
Outputs shape: (0, 1)
Test inputs shape: (0, 29, 13)
Test outputs shape: (0, 1)
Not enough data to train

This is wrong you only need to use the path to the top level folder for the wake-word. So it should be:

(.venv) pi@raspberrypi:~/mycroft-precise $ precise-train hey-bram.net /home/pi/mycroft-precise/hey-bram/ -e 100 -s 0.5

No you get on duplicate with added random noise for each wake-word wav. This 1:1 ratio of clean to noisy wake-word samples works best for me.

You will need about 40 different of those 5 minute files with different random talking /household noises /random noises /tv noises (everything you expect in your household except the wake-word) so that you will have about 200 1 minute files in your noise folder.

Yes, it works !!!
and again a step further :slight_smile:

the next command is precise-train-incremental...
my noise files are in folder /home/pi/mycroft-precise/ruis-chunks, so i did a:

precise-train-incremental hey-bram.net /home/pi/mycroft-precise/hey-bram/ -r /home/pi/mycroft-precise/ruis-chunks/ -e 50 -th 0.4 -s 0.5

and get strange feedback...

(.venv) pi@raspberrypi:~/mycroft-precise $ precise-train-incremental hey-bram.net /home/pi/mycroft-precise/hey-bram/ -r /home/pi/mycroft-precise/ruis-chunks -e 50 -th 0.4 -s 0.5
Using TensorFlow backend.
WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.AttrValue is deprecated. Please use tf.compat.v1.AttrValue instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.COMPILER_VERSION is deprecated. Please use tf.version.COMPILER_VERSION instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.CXX11_ABI_FLAG is deprecated. Please use tf.sysconfig.CXX11_ABI_FLAG instead.

WARNING:tensorflow:From /home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/tensorflow/__init__.py:98: The name tf.ConditionalAccumulator is deprecated. Please use tf.compat.v1.ConditionalAccumulator instead.

Loading from hey-bram.net...
Traceback (most recent call last):
  File "/home/pi/mycroft-precise/.venv/bin/precise-train-incremental", line 33, in <module>
    sys.exit(load_entry_point('mycroft-precise', 'console_scripts', 'precise-train-incremental')())
  File "/home/pi/mycroft-precise/precise/scripts/base_script.py", line 43, in run_main
    script = cls(args)
  File "/home/pi/mycroft-precise/precise/scripts/train_incremental.py", line 70, in __init__
    super().__init__(args)
  File "/home/pi/mycroft-precise/precise/scripts/train.py", line 87, in __init__
    self.model = create_model(args.model, params)
  File "/home/pi/mycroft-precise/precise/model.py", line 70, in create_model
    model = load_precise_model(model_name)
  File "/home/pi/mycroft-precise/precise/model.py", line 54, in load_precise_model
    return load_keras().models.load_model(model_name)
  File "/home/pi/mycroft-precise/.venv/lib/python3.7/site-packages/keras/models.py", line 242, in load_model
    model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'

Well I just tried a fresh install and this one gives me the same error unfortunately :weary:
Somebody already reported it on github.
My old version is still working so it’s a newly introduced bug :frowning:
I still have the old version fortunately but I’ll hope they will fix it soon.

edit

For reference here is the issue: