High Availability for Node-RED (and more) with Node-RED

Hello all,

working with Node-RED for years now for hobby and work. But still I am sometimes surprised how it can leverage automation.

It all started with a raspberry pi running Node-RED on a SD card. With the growing of the automations it was already getting obvious how important this Node-RED instance became. It became for a lot of lights and primary household appliances a single point of failure. So when the first SD card crashed I had to do some damage-control for the WAF (Wife-Acceptance-Factor). This led me through a great (and long) journey of learning new things and finally I am happy with the current result of a high available solution for Node-RED at home.

The main part of the automations were on that timed based on MQTT so just 2 pi's with both the same flows running would be a little problematic because every mqtt event will be executed twice. For instance a toggle command would switch the light on, while the other will switch it directly off.

My first approach was to make it on a docker cluster with 2 pi's. So I was aiming for an automated failover. Node-RED or RPI dies, other pi will start a new instance of the NR container. So I had to learn docker swarm and came to the conclusion I needed a third RPI with docker to have a concensus. So the pi cloud grows.

Part of this solutions was to have a collective storage for storing the flow files, in this first attempt it was a NFS share on a Synology NAS that was mounted on all RPI's.

After that the conclusion came quickly the stability improved. This was mostly due to the fact that most write actions (influx/mqtt) were now done on the network share and not on the SD card anymore.

The cluster was reachable through a round robin DNS address. (One name, all IP addresses).

But there were still some things I wanted to tackle:
-The NAS as SPOF (single point of failure)
-The amount of time docker needed to start a new instance of node red.
-If a node was down the cluster was not reachable on that specific IP address.

I solved them by:

-For the NAS SPOF I changed the SD cards for SSD's connected to the pi through USB enclosures. And installed GlusterFS on all three to have a storage cluster devided over the pi cluster.

-To accelerate the failover of the node red instance I had to take into consideration what kind of flows were running on the NR instance. For HTTP it was no problem to run active-active solutions, but for mqtt this is a challenge like mentioned above.

I got inspired by this youtube movie from Kurt Braun and build myself a heart beat system in Node Red.

[{"id":"93518d52.ef69f","type":"tab","label":"Heartbeat","disabled":false,"info":""},{"id":"f1fa8f7.92a347","type":"switch","z":"93518d52.ef69f","name":"host?","property":"host","propertyType":"global","rules":[{"t":"empty"},{"t":"null"},{"t":"else"}],"checkall":"true","repair":false,"outputs":3,"x":410,"y":140,"wires":[["b480b625.b12d68"],["b480b625.b12d68"],["b8a02164.a8e01"]]},{"id":"299156be.857e5a","type":"change","z":"93518d52.ef69f","name":"","rules":[{"t":"set","p":"host","pt":"global","to":"payload","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":1000,"y":120,"wires":[["f9cfb3cb.04622"]]},{"id":"b8a02164.a8e01","type":"change","z":"93518d52.ef69f","name":"set host","rules":[{"t":"set","p":"payload","pt":"msg","to":"host","tot":"global"}],"action":"","property":"","from":"","to":"","reg":false,"x":600,"y":160,"wires":[["f9cfb3cb.04622","9a3077b4.2867e8"]]},{"id":"b480b625.b12d68","type":"file in","z":"93518d52.ef69f","name":"","filename":"/hostetc/hostname","format":"utf8","chunk":false,"sendError":false,"encoding":"none","x":630,"y":120,"wires":[["e3bf9515.6a9b78"]]},{"id":"c007a121.f126","type":"trigger","z":"93518d52.ef69f","name":"","op1":"true","op2":"false","op1type":"bool","op2type":"bool","duration":"1000","extend":true,"overrideDelay":false,"units":"ms","reset":"","bytopic":"topic","topic":"topic","outputs":1,"x":580,"y":360,"wires":[["eafd9045.c9b8d"]]},{"id":"f9cfb3cb.04622","type":"template","z":"93518d52.ef69f","name":"set topic","field":"topic","fieldType":"msg","format":"handlebars","syntax":"mustache","template":"noderedha/{{global.host}}/pulse","output":"str","x":1160,"y":160,"wires":[["d5df44d3.6fae48"]]},{"id":"d5df44d3.6fae48","type":"mqtt out","z":"93518d52.ef69f","name":"","topic":"","qos":"","retain":"","broker":"8dc8d818.347f98","x":1310,"y":160,"wires":[]},{"id":"799d202c.1651c","type":"mqtt in","z":"93518d52.ef69f","name":"","topic":"noderedha/+/pulse","qos":"2","datatype":"auto","broker":"8dc8d818.347f98","x":210,"y":360,"wires":[["904831f8.130ca"]]},{"id":"e3bf9515.6a9b78","type":"string","z":"93518d52.ef69f","name":"","methods":[{"name":"strip","params":[{"type":"str","value":"\\n"}]},{"name":"decodeHTMLEntities","params":[]}],"prop":"payload","propout":"payload","object":"msg","objectout":"msg","x":810,"y":120,"wires":[["299156be.857e5a","9a3077b4.2867e8"]]},{"id":"904831f8.130ca","type":"change","z":"93518d52.ef69f","name":"","rules":[{"t":"set","p":"host","pt":"msg","to":"payload","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":410,"y":360,"wires":[["c007a121.f126"]]},{"id":"eafd9045.c9b8d","type":"function","z":"93518d52.ef69f","name":"set state node","func":"var context = \"noderedha.\"+msg.host+\".state\";\nglobal.set(context, msg.payload, \"memory\");\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","x":760,"y":360,"wires":[["310df807.b704f8"]]},{"id":"a068d750.751d58","type":"inject","z":"93518d52.ef69f","name":"500 ms","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"0.5","crontab":"","once":true,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":260,"y":140,"wires":[["f1fa8f7.92a347"]]},{"id":"2ed01163.5c54ee","type":"change","z":"93518d52.ef69f","name":"2raspdock master","rules":[{"t":"set","p":"master","pt":"msg","to":"2raspdock","tot":"str"},{"t":"set","p":"noderedha.0raspdockman.master","pt":"global","to":"false","tot":"bool"},{"t":"set","p":"noderedha.1raspdock.master","pt":"global","to":"false","tot":"bool"},{"t":"set","p":"noderedha.2raspdock.master","pt":"global","to":"true","tot":"bool"}],"action":"","property":"","from":"","to":"","reg":false,"x":1430,"y":340,"wires":[["69cc4908.994ac8"]]},{"id":"310df807.b704f8","type":"switch","z":"93518d52.ef69f","name":"2raspdock online?","property":"noderedha.2raspdock.state","propertyType":"global","rules":[{"t":"true"},{"t":"else"}],"checkall":"true","repair":false,"outputs":2,"x":990,"y":360,"wires":[["2ed01163.5c54ee"],["bc33ad1d.bf35f"]]},{"id":"f9e691a9.86d01","type":"change","z":"93518d52.ef69f","name":"1raspdock master","rules":[{"t":"set","p":"master","pt":"msg","to":"1raspdock","tot":"str"},{"t":"set","p":"0raspdockman.master","pt":"global","to":"false","tot":"bool"},{"t":"set","p":"1raspdock.master","pt":"global","to":"true","tot":"bool"},{"t":"set","p":"2raspdock.master","pt":"global","to":"false","tot":"bool"}],"action":"","property":"","from":"","to":"","reg":false,"x":1430,"y":380,"wires":[["69cc4908.994ac8"]]},{"id":"bc33ad1d.bf35f","type":"switch","z":"93518d52.ef69f","name":"1raspdock online?","property":"noderedha.1raspdock.state","propertyType":"global","rules":[{"t":"true"},{"t":"else"}],"checkall":"true","repair":false,"outputs":2,"x":1210,"y":400,"wires":[["f9e691a9.86d01"],["ac6076fa.dfbfd8"]]},{"id":"ac6076fa.dfbfd8","type":"change","z":"93518d52.ef69f","name":"0raspdockman master","rules":[{"t":"set","p":"master","pt":"msg","to":"0raspdockman","tot":"str"},{"t":"set","p":"0raspdockman.master","pt":"global","to":"true","tot":"bool"},{"t":"set","p":"1raspdock.master","pt":"global","to":"false","tot":"bool"},{"t":"set","p":"2raspdock.master","pt":"global","to":"false","tot":"bool"}],"action":"","property":"","from":"","to":"","reg":false,"x":1440,"y":420,"wires":[["69cc4908.994ac8"]]},{"id":"51fe151f.c747ac","type":"change","z":"93518d52.ef69f","name":"master = true","rules":[{"t":"set","p":"noderedha.master.b","pt":"global","to":"true","tot":"bool"},{"t":"set","p":"noderedha.master.host","pt":"global","to":"master","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":1830,"y":360,"wires":[["4e305936.d22e58"]]},{"id":"661accd6.9b06a4","type":"change","z":"93518d52.ef69f","name":"master = false","rules":[{"t":"set","p":"noderedha.master.b","pt":"global","to":"false","tot":"bool"},{"t":"set","p":"noderedha.master.host","pt":"global","to":"master","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":1840,"y":400,"wires":[["4e305936.d22e58"]]},{"id":"69cc4908.994ac8","type":"switch","z":"93518d52.ef69f","name":"is this host master?","property":"master","propertyType":"msg","rules":[{"t":"eq","v":"host","vt":"global"},{"t":"else"}],"checkall":"true","repair":false,"outputs":2,"x":1650,"y":380,"wires":[["51fe151f.c747ac"],["661accd6.9b06a4"]]},{"id":"e0953b55.868398","type":"switch","z":"93518d52.ef69f","name":"Master?","property":"noderedha.master.b","propertyType":"global","rules":[{"t":"true"}],"checkall":"true","repair":false,"outputs":1,"x":820,"y":520,"wires":[["bfe0f80f.4de768"]]},{"id":"bfe0f80f.4de768","type":"debug","z":"93518d52.ef69f","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":970,"y":520,"wires":[]},{"id":"4e305936.d22e58","type":"change","z":"93518d52.ef69f","name":"who is master","rules":[{"t":"set","p":"payload","pt":"msg","to":"noderedha.master","tot":"global"}],"action":"","property":"","from":"","to":"","reg":false,"x":2020,"y":380,"wires":[["288a01bd.65ba3e"]]},{"id":"288a01bd.65ba3e","type":"link out","z":"93518d52.ef69f","name":"master?","links":["e834912.10efd7"],"x":2135,"y":380,"wires":[]},{"id":"e834912.10efd7","type":"link in","z":"93518d52.ef69f","name":"","links":["288a01bd.65ba3e"],"x":1555,"y":140,"wires":[["e6c94a10.78cd18","1e798fde.cb786"]]},{"id":"e6c94a10.78cd18","type":"debug","z":"93518d52.ef69f","name":"Master?","active":true,"tosidebar":false,"console":false,"tostatus":true,"complete":"payload.b","targetType":"msg","statusVal":"payload.b","statusType":"auto","x":1690,"y":100,"wires":[]},{"id":"1e798fde.cb786","type":"debug","z":"93518d52.ef69f","name":"Master","active":true,"tosidebar":false,"console":false,"tostatus":true,"complete":"payload.host","targetType":"msg","statusVal":"payload.host","statusType":"auto","x":1680,"y":160,"wires":[]},{"id":"9a3077b4.2867e8","type":"debug","z":"93518d52.ef69f","name":"","active":true,"tosidebar":false,"console":false,"tostatus":true,"complete":"payload","targetType":"msg","statusVal":"payload","statusType":"auto","x":1030,"y":60,"wires":[]},{"id":"e8e7bb73.c16ed8","type":"http in","z":"93518d52.ef69f","name":"","url":"/test","method":"get","upload":false,"swaggerDoc":"","x":350,"y":880,"wires":[["c08bdb57.496338"]]},{"id":"83e8a20a.1c17","type":"http response","z":"93518d52.ef69f","name":"","statusCode":"","headers":{},"x":670,"y":880,"wires":[]},{"id":"c08bdb57.496338","type":"template","z":"93518d52.ef69f","name":"","field":"payload","fieldType":"msg","format":"handlebars","syntax":"mustache","template":"Hello from {{global.host}}","output":"str","x":520,"y":880,"wires":[["83e8a20a.1c17"]]},{"id":"3cf33ef.24a69c2","type":"inject","z":"93518d52.ef69f","name":"20s","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"20","crontab":"","once":true,"onceDelay":"5","topic":"","payload":"","payloadType":"date","x":750,"y":320,"wires":[["310df807.b704f8"]]},{"id":"d6df7f08.3a7b3","type":"comment","z":"93518d52.ef69f","name":"heartbeat","info":"","x":270,"y":60,"wires":[]},{"id":"bcf87a52.5e1da8","type":"comment","z":"93518d52.ef69f","name":"master","info":"","x":230,"y":300,"wires":[]},{"id":"bb2c986a.ff12d8","type":"inject","z":"93518d52.ef69f","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":640,"y":520,"wires":[["e0953b55.868398"]]},{"id":"8dc8d818.347f98","type":"mqtt-broker","name":"","broker":"mqtt.wesselink","port":"1883","clientid":"","usetls":false,"compatmode":true,"keepalive":"60","cleansession":true,"birthTopic":"","birthQos":"0","birthPayload":"","closeTopic":"","closePayload":"","willTopic":"","willQos":"0","willPayload":""}]

All Mqtt flows have now a switch node to check if they are master or not.
Failover is done within 1 second.

-To solve the problem that all my sensors and other system can reach the cluster on this pi I installed keepalived on the raspberry pi's. But I did something wrong and before I fixed it I just thought in my mind what to do and how would I build that in Node-RED. The most important thing is that the master host is listening to a virtual IP address. On linux setting this is very simple with one command: "sudo ifconfig eth0:0 192.168.x.xxx". So together with the flow that I had above. I just opted to install an extra node red instance on every node to handle the Virtual IP address.

[{"id":"5bfd9057.fd1ba","type":"tab","label":"Heartbeat/VIP","disabled":false,"info":""},{"id":"f1fa8f7.92a347","type":"switch","z":"5bfd9057.fd1ba","name":"","property":"host","propertyType":"global","rules":[{"t":"empty"},{"t":"null"},{"t":"else"}],"checkall":"true","repair":false,"outputs":3,"x":710,"y":520,"wires":[["b480b625.b12d68"],["b480b625.b12d68"],["b8a02164.a8e01"]]},{"id":"299156be.857e5a","type":"change","z":"5bfd9057.fd1ba","name":"","rules":[{"t":"set","p":"host","pt":"global","to":"payload","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":1260,"y":500,"wires":[["672bd18e.ce6b9"]]},{"id":"b8a02164.a8e01","type":"change","z":"5bfd9057.fd1ba","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"host","tot":"global"}],"action":"","property":"","from":"","to":"","reg":false,"x":900,"y":540,"wires":[["9a3077b4.2867e8","672bd18e.ce6b9"]]},{"id":"b480b625.b12d68","type":"file in","z":"5bfd9057.fd1ba","name":"","filename":"/etc/hostname","format":"utf8","chunk":false,"sendError":false,"encoding":"none","x":900,"y":500,"wires":[["e3bf9515.6a9b78"]]},{"id":"e3bf9515.6a9b78","type":"string","z":"5bfd9057.fd1ba","name":"","methods":[{"name":"strip","params":[{"type":"str","value":"\\n"}]},{"name":"decodeHTMLEntities","params":[]}],"prop":"payload","propout":"payload","object":"msg","objectout":"msg","x":1090,"y":500,"wires":[["299156be.857e5a","9a3077b4.2867e8"]]},{"id":"eafd9045.c9b8d","type":"function","z":"5bfd9057.fd1ba","name":"set state node","func":"var context = \"noderedha.\"+msg.host+\".state\";\nglobal.set(context, msg.payload, \"memory\");\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","x":540,"y":700,"wires":[["310df807.b704f8"]]},{"id":"a068d750.751d58","type":"inject","z":"5bfd9057.fd1ba","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"0.5","crontab":"","once":true,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":150,"y":520,"wires":[["234acba9.4503f4"]]},{"id":"2ed01163.5c54ee","type":"change","z":"5bfd9057.fd1ba","name":"","rules":[{"t":"set","p":"master","pt":"msg","to":"2raspdock","tot":"str"},{"t":"set","p":"noderedha.0raspdockman.master","pt":"global","to":"false","tot":"bool"},{"t":"set","p":"noderedha.1raspdock.master","pt":"global","to":"false","tot":"bool"},{"t":"set","p":"noderedha.2raspdock.master","pt":"global","to":"true","tot":"bool"}],"action":"","property":"","from":"","to":"","reg":false,"x":1060,"y":660,"wires":[["69cc4908.994ac8"]]},{"id":"310df807.b704f8","type":"switch","z":"5bfd9057.fd1ba","name":"","property":"noderedha.2raspdock.state","propertyType":"global","rules":[{"t":"true"},{"t":"else"}],"checkall":"true","repair":false,"outputs":2,"x":710,"y":700,"wires":[["2ed01163.5c54ee"],["bc33ad1d.bf35f"]]},{"id":"f9e691a9.86d01","type":"change","z":"5bfd9057.fd1ba","name":"","rules":[{"t":"set","p":"master","pt":"msg","to":"1raspdock","tot":"str"},{"t":"set","p":"noderedha0raspdockman.master","pt":"global","to":"false","tot":"bool"},{"t":"set","p":"noderedha1raspdock.master","pt":"global","to":"true","tot":"bool"},{"t":"set","p":"noderedha2raspdock.master","pt":"global","to":"false","tot":"bool"}],"action":"","property":"","from":"","to":"","reg":false,"x":1060,"y":700,"wires":[["69cc4908.994ac8"]]},{"id":"bc33ad1d.bf35f","type":"switch","z":"5bfd9057.fd1ba","name":"","property":"noderedha.1raspdock.state","propertyType":"global","rules":[{"t":"true"},{"t":"else"}],"checkall":"true","repair":false,"outputs":2,"x":890,"y":720,"wires":[["f9e691a9.86d01"],["ac6076fa.dfbfd8"]]},{"id":"ac6076fa.dfbfd8","type":"change","z":"5bfd9057.fd1ba","name":"","rules":[{"t":"set","p":"master","pt":"msg","to":"0raspdockman","tot":"str"},{"t":"set","p":"noderedha0raspdockman.master","pt":"global","to":"true","tot":"bool"},{"t":"set","p":"noderedha1raspdock.master","pt":"global","to":"false","tot":"bool"},{"t":"set","p":"noderedha2raspdock.master","pt":"global","to":"false","tot":"bool"}],"action":"","property":"","from":"","to":"","reg":false,"x":1060,"y":740,"wires":[["69cc4908.994ac8"]]},{"id":"51fe151f.c747ac","type":"change","z":"5bfd9057.fd1ba","name":"","rules":[{"t":"set","p":"noderedha.master.b","pt":"global","to":"true","tot":"bool"},{"t":"set","p":"noderedha.master.host","pt":"global","to":"master","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":1480,"y":680,"wires":[["4e305936.d22e58"]]},{"id":"661accd6.9b06a4","type":"change","z":"5bfd9057.fd1ba","name":"","rules":[{"t":"set","p":"noderedha.master.b","pt":"global","to":"false","tot":"bool"},{"t":"set","p":"noderedha.master.host","pt":"global","to":"master","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":1480,"y":720,"wires":[["4e305936.d22e58"]]},{"id":"69cc4908.994ac8","type":"switch","z":"5bfd9057.fd1ba","name":"","property":"master","propertyType":"msg","rules":[{"t":"eq","v":"host","vt":"global"},{"t":"else"}],"checkall":"true","repair":false,"outputs":2,"x":1290,"y":700,"wires":[["51fe151f.c747ac"],["661accd6.9b06a4"]]},{"id":"4e305936.d22e58","type":"change","z":"5bfd9057.fd1ba","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"noderedha.master","tot":"global"}],"action":"","property":"","from":"","to":"","reg":false,"x":1680,"y":700,"wires":[["288a01bd.65ba3e"]]},{"id":"288a01bd.65ba3e","type":"link out","z":"5bfd9057.fd1ba","name":"master?","links":["e834912.10efd7","68c9f101.81598"],"x":1815,"y":700,"wires":[]},{"id":"e834912.10efd7","type":"link in","z":"5bfd9057.fd1ba","name":"","links":["288a01bd.65ba3e"],"x":695,"y":100,"wires":[["e6c94a10.78cd18","1e798fde.cb786"]]},{"id":"e6c94a10.78cd18","type":"debug","z":"5bfd9057.fd1ba","name":"Master?","active":true,"tosidebar":false,"console":false,"tostatus":true,"complete":"payload.b","targetType":"msg","statusVal":"payload.b","statusType":"auto","x":830,"y":60,"wires":[]},{"id":"1e798fde.cb786","type":"debug","z":"5bfd9057.fd1ba","name":"Master","active":true,"tosidebar":false,"console":false,"tostatus":true,"complete":"payload.host","targetType":"msg","statusVal":"payload.host","statusType":"auto","x":820,"y":120,"wires":[]},{"id":"9a3077b4.2867e8","type":"debug","z":"5bfd9057.fd1ba","name":"","active":true,"tosidebar":false,"console":false,"tostatus":true,"complete":"payload","targetType":"msg","statusVal":"payload","statusType":"auto","x":1450,"y":480,"wires":[]},{"id":"4cca1ddc.762934","type":"http in","z":"5bfd9057.fd1ba","name":"","url":"/testinfra","method":"get","upload":false,"swaggerDoc":"","x":150,"y":80,"wires":[["d8f1398e.61e678"]]},{"id":"db8db01b.d57d1","type":"http response","z":"5bfd9057.fd1ba","name":"","statusCode":"","headers":{},"x":500,"y":80,"wires":[]},{"id":"d8f1398e.61e678","type":"template","z":"5bfd9057.fd1ba","name":"","field":"payload","fieldType":"msg","format":"handlebars","syntax":"mustache","template":"hello from {{global.host}}","output":"str","x":330,"y":80,"wires":[["db8db01b.d57d1"]]},{"id":"958b6c65.d99ca","type":"n2n out","z":"5bfd9057.fd1ba","iface":"","name":"","bcast":true,"x":1750,"y":540,"wires":[]},{"id":"71bffe7.610af","type":"function","z":"5bfd9057.fd1ba","name":"","func":"var msg_o = {};\nvar process_data = 1212;\n\nmsg_o = {payload : process_data};\n\nreturn msg_o;\n","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":1620,"y":540,"wires":[["958b6c65.d99ca"]]},{"id":"672bd18e.ce6b9","type":"change","z":"5bfd9057.fd1ba","name":"","rules":[{"t":"set","p":"topic","pt":"msg","to":"clusterha","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":1470,"y":540,"wires":[["71bffe7.610af"]]},{"id":"4bac890f.a55b68","type":"trigger","z":"5bfd9057.fd1ba","name":"","op1":"true","op2":"false","op1type":"bool","op2type":"bool","duration":"1000","extend":true,"overrideDelay":false,"units":"ms","reset":"","bytopic":"topic","topic":"host","outputs":1,"x":340,"y":700,"wires":[["eafd9045.c9b8d"]]},{"id":"6493a572.aba46c","type":"n2n in","z":"5bfd9057.fd1ba","name":"","topic":"","iface":"","rate":"0","ignore":true,"bcast":true,"x":170,"y":700,"wires":[["4bac890f.a55b68"]]},{"id":"386a4aa7.d65ed6","type":"exec","z":"5bfd9057.fd1ba","command":"ifconfig","addpay":"","append":"","useSpawn":"false","timer":"","oldrc":false,"name":"","x":480,"y":960,"wires":[["4ab2391f.4d9578"],[],[]]},{"id":"4ab2391f.4d9578","type":"switch","z":"5bfd9057.fd1ba","name":"","property":"payload","propertyType":"msg","rules":[{"t":"cont","v":"eth0:0","vt":"str"},{"t":"else"}],"checkall":"true","repair":false,"outputs":2,"x":630,"y":960,"wires":[["fc72aae3.4469e8"],["be7c490b.1e0e18"]]},{"id":"c760bb6b.aa8298","type":"delay","z":"5bfd9057.fd1ba","name":"","pauseType":"rate","timeout":"5","timeoutUnits":"seconds","rate":"1","nbRateUnits":"5","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":true,"x":300,"y":960,"wires":[["386a4aa7.d65ed6"]]},{"id":"fc72aae3.4469e8","type":"switch","z":"5bfd9057.fd1ba","name":"","property":"noderedha.master.b","propertyType":"global","rules":[{"t":"true"},{"t":"false"}],"checkall":"true","repair":false,"outputs":2,"x":770,"y":920,"wires":[[],["3dcfc941.f0ec86"]]},{"id":"be7c490b.1e0e18","type":"switch","z":"5bfd9057.fd1ba","name":"","property":"noderedha.master.b","propertyType":"global","rules":[{"t":"true"},{"t":"false"}],"checkall":"true","repair":false,"outputs":2,"x":770,"y":980,"wires":[["232490c3.317f7"],[]]},{"id":"232490c3.317f7","type":"exec","z":"5bfd9057.fd1ba","command":"sudo ifconfig eth0:0 192.168.2.206","addpay":"","append":"","useSpawn":"false","timer":"","oldrc":false,"name":"","x":1020,"y":980,"wires":[[],[],[]]},{"id":"3dcfc941.f0ec86","type":"exec","z":"5bfd9057.fd1ba","command":"sudo ifconfig eth0:0 down","addpay":"","append":"","useSpawn":"false","timer":"","oldrc":false,"name":"","x":990,"y":920,"wires":[[],[],[]]},{"id":"d5735d97.5aa37","type":"inject","z":"5bfd9057.fd1ba","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"5","crontab":"","once":true,"onceDelay":"5","topic":"","payload":"","payloadType":"date","x":310,"y":920,"wires":[["386a4aa7.d65ed6"]]},{"id":"68c9f101.81598","type":"link in","z":"5bfd9057.fd1ba","name":"","links":["288a01bd.65ba3e"],"x":155,"y":960,"wires":[["c760bb6b.aa8298"]]},{"id":"45168dcb.123ac4","type":"exec","z":"5bfd9057.fd1ba","command":"sudo systemctl status docker","addpay":"","append":"","useSpawn":"false","timer":"","oldrc":false,"name":"","x":400,"y":280,"wires":[["1991860f.0d2d9a"],[],[]]},{"id":"22c9dc18.968ea4","type":"inject","z":"5bfd9057.fd1ba","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"5","crontab":"","once":true,"onceDelay":"5","topic":"","payload":"","payloadType":"date","x":150,"y":280,"wires":[["45168dcb.123ac4"]]},{"id":"1991860f.0d2d9a","type":"switch","z":"5bfd9057.fd1ba","name":"","property":"payload","propertyType":"msg","rules":[{"t":"cont","v":"active (running)","vt":"str"},{"t":"else"}],"checkall":"true","repair":false,"outputs":2,"x":670,"y":300,"wires":[["93fdfc48.4017c"],["35129e7d.a0bde2"]]},{"id":"93fdfc48.4017c","type":"change","z":"5bfd9057.fd1ba","name":"","rules":[{"t":"set","p":"dockeractive","pt":"global","to":"true","tot":"bool"}],"action":"","property":"","from":"","to":"","reg":false,"x":860,"y":280,"wires":[[]]},{"id":"35129e7d.a0bde2","type":"change","z":"5bfd9057.fd1ba","name":"","rules":[{"t":"set","p":"dockeractive","pt":"global","to":"false","tot":"bool"}],"action":"","property":"","from":"","to":"","reg":false,"x":860,"y":320,"wires":[[]]},{"id":"234acba9.4503f4","type":"switch","z":"5bfd9057.fd1ba","name":"Docker active?","property":"dockeractive","propertyType":"global","rules":[{"t":"true"}],"checkall":"true","repair":false,"outputs":1,"x":430,"y":520,"wires":[["f1fa8f7.92a347"]]},{"id":"14637fe4.98aef","type":"comment","z":"5bfd9057.fd1ba","name":"docker active","info":"","x":130,"y":220,"wires":[]},{"id":"842ae888.4c92f8","type":"comment","z":"5bfd9057.fd1ba","name":"heartbeat","info":"","x":120,"y":460,"wires":[]},{"id":"b7e5e083.d36f2","type":"comment","z":"5bfd9057.fd1ba","name":"who is master?","info":"","x":140,"y":620,"wires":[]},{"id":"3315548a.c5a29c","type":"comment","z":"5bfd9057.fd1ba","name":"if master set VIP active","info":"","x":160,"y":840,"wires":[]}]

I changed the method of sending the heartbeat here to an UDP broadcast instead of mqtt and I build in an extra check to see if Docker is running before sending out the heartbeat.
At this moment I just was again reminded at how powerfull Node-RED is.
If you can think it you can build it.

Besides for the a happy wife happy life motivation it was also a really nice learning experience. This post is mostly to show my gratitude towards this project. Great work!!

Future projects:
-improve mqtt cluster (from a bridged mosquitto, upgrade to an rabbitmq mqtt cluster)
-power back up for network and cluster
-high available network at home (but I think those kinde of switches etc will not go through the budget committee)

7 Likes

Replying on myself: design already obsolete now I dove into Mqtt 5.0 and discovered the shared subscriptions. :joy: still valid for the virtual IP.

1 Like

Dont forget UPS

Hi Emil,

thanks for your short tutorial. Could you eventually describe a little, how to use it? I have 2 local rasperries (without docker), where i try your HA feature. First of all thanks for this. It brought me a lot further. But i stuck here. Do you use Heartbeat on your master & Heartbeat/VIP on your slave?

Greetz,
Mike

Hello Mike,

The idea is you run this flows on both raspberries and they will know they are alive.

This information you can then use in your flows to run (or not run) an automation.

You just need to adjust the election of the master to the amount of nodes you have. And ofcourse the hostnames, every node needs an unique hostname.

I didn't use keepalived in the end. I rebuild keepalived in node red in the second flow shared above.

If you don't use docker you can combine those 2 to one flow.

Very interesting, thanx for share!
I switched from raspberry pi to docker and would love to have a automatic backup system like this. I stuck on the fact that all my IoT devices connect to a specified / static IP address. I ve no idea how to switch the IP address in my router to the backup device automatically.

You should learn what a virtual IP is. If you Google you can get some nice YouTube movies for that. Then with keepalived or the above mentioned node red flow you can manage a single virtual IP for that.

1 Like

Interesting technical exercise, really nice! In my case I sort of gave up on making such effort since I anyway have a bunch of distributed RPi's for dedicated very specific tasks. Making them all highly available felt like overshooting, at least for our home automattion

Instead I use a retired old laptop as my main controller running my main Node-RED flow, earlier struggling Windows, was upgraded with a fast SSD and Debian. Works excellent, with built in ups and no SD card worries. All other distributed RPi's, 7 in total plus a Jetson Nano for AI tasks, are all running Node-RED and communicates via MQTT. The RPi's all have an extra SD Card reader attached with the full & fresh SD Card backup on, just in case...

This helps me to minimize unwanted downtime, I guess this is what you would target with any kind of measures you take

For our home usage I felt this was enough, WAF factor is luckily in control. For production & commercial solutions, well the actual need has to be defined

1 Like

I solved ALL the issues and worries running Node Red and Mosquitto MQTT under Docker on a Cloud VPS, I know, it seems expensive (5$ a month), but never had a single issue in two years, no dead SDs or RPIs, no downtimes, no backup issues.
Plus I have two different places (cities) where I have my Tasmotas IOT devices, and just one public IP address to serve them all.
WAF = 100 % accepted since i have no RPIs laying around :laughing:

1 Like

Funny, the WAF of "sorry, you can't turn the lights on 'cause our internet connection is currently down" would be interesting to watch here... :joy: I guess we all have our own use-cases...

1 Like

And of course you don't learn in that way :wink:

I think the reality for most running local gear is pretty much the same too? Never had one failure of RPi or ssd running NR in over 3 years of "Production" systems. Had 2 sd cards fail in "Development" systems, probably more to do with misuse.

Not to mention the '"crazy" RPi zeros I have running RPi cams for surveillance outside. From memory the sd card in one has now been running for more than 4 years, in all weather, in a 3d printed (non weatherproof) enclosure.

This stuff never ceases to amaze. Have said all this, I need to go find some wood. :rofl:

1 Like

Update Spring 2023:

From the list:
Future projects:
-improve mqtt cluster (from a bridged mosquitto, upgrade to an rabbitmq mqtt cluster)

I finally came to this part. In the end I went with an EMQX cluster. I decided to go with EMQX because they did an solid implementation of MQTT 5 while RabbitMQ did not implement everything.

With this I know have Shared subscriptions and I can implement that on my cluster and I don't need the master node switch in every MQTT flow. In general it looks stable but I am still finding some stubborn behaviour and I need to dive into the QOS and Connect settings but in general it looks like working very well.

-power back up for network and cluster
I bought 2 UPS's and they improved a lot of the stability. The fact that network and hardware can be online for 30 minutes without power makes sure that everything keeps more stable.

-high available network at home (but I think those kind of switches etc will not go through the budget committee)

My knowledge on this part grew. I just bought an Protectli Vault to install OPNSense on it. Within OPNSensee a back up router can ben configured and this will improve also the HA on the network. Together with a deep dive in STP also the LAN has been upgraded so a loss of 1 device can be no problem for uptime.

1 Like