Watchdog to restart NR

#1

I have below a simplified flow for monitoring that certain messages, filtered out by the Switch Node, are continuously received from a specific equipment (hw connected via serial port in another flow but the same NR).

My questions: If NR "freezes" for some unknown reason, will the Trigger Node, being triggered, still fire and execute the function in the Exec Node? Or is everything dead from the "freezing" point?

#2

Essentially Node-RED, like the underlying NodeJS (and JavaScript in general) is largely single-threaded. So if anything stops or kills the process then the flows cannot continue.

However, it depends what you mean by "freezes" - there are lots of things that might go wrong with a flow or a node that wouldn't lock the process since that is a fairly core design principal for JavaScript/NodeJS applications.

In my, albeit limited, experience, very little causes NR to stop. As far as I can remember, as long as the startup process works, NR keeps going unless something causes an almighty crash and you can handle alerting and reporting for that as you would any other application or service.

Bottom line, impossible to know for sure whether the trigger will or wont fire. Best to use defensive programming to make sure your flow handles exceptions.

#3

Thanks Julian, yes, you are right. I was just wondering since a timer would normally run in a separate thread (to my understanding) and I hoped that the it would already load the function code to be executed when it was triggered. As you said, defensive...is the goal

Anyway, if I can't control or find what is causing the sudden death I can always build a watchdog script "outside" of NR, basically monitoring in the same way and if needed force a restart or reboot
(I hate it of course, lipstick on corpse)

#4

Sorry, wasn't sure if you knew that. I don't know the internals of NodeJS and V8 well enough to predict how a setTimeout might behave though if the parent thread crashes, I'd expect it would simply disappear. But because Node-RED is a service and does its best to keep on truckin' - it can be hard to predict how it would behave.

Well an external watchdog is often a very sensible approach to reliable computing. Nobody can ever cover all eventualities. Putting in a catch node may help in some circumstances so may also be worth considering.

Taken to an extreme, you might even set up a second system with bridged MQTT brokers. Making use of a heartbeat MQTT output from each system along with an LWT so that the brokers know when each system is on/off-line. Then you could have the same flow on both systems with a gate at the start of the flow to block progress on one of the systems until it detects that the other system is offline :smile:

Instant poor-man's high-availability.

1 Like
#5

Thats a cool one!!! Have to consider if this is overkill for my home automation, but tempting I must say, just for the thrill !!!

1 Like
#6

I sometimes use this sort of arrangement to patch developmental bits of code into and out of my deployed system for testing. No risk, no thrill.