This is likely just user error, but I am not seeing why. I am pretty confident in my underlying code because it will work for several hours just fine before it has a problem.
Problem:
As part of a much larger flow I have a delay node section and I have watched the delay node just "quit" prior to reaching the end of its timer. It is not reset somehow because the "reset" indicator never appears. The little active blue square simply disappears and no message is sent. Has anyone ever experienced this? My timer is typically 120s or longer (but still only a few minutes, not hours). When I have set it to 30s, it is more reliable.
Below is an example flow that simulates the section of my much larger flow. Do you see any red flags? So far, I have not seen it have any problems.
[{"id":"b4c72232.fc463","type":"function","z":"c02f1515.bbbc08","name":"conditionChecker","func":"var msg1 = {}\nvar msg2 = {}\n\nmaxLoops = 5\ntestCondition = flow.get(\"testCondition\")\nvar counter = context.get('counter') || 0\n\nif (msg.reset) \n{\n context.set('counter', 0)\n \n msg1.payload = \"timer reset. condition value: \\\"\" + testCondition + \"\\\"\"\n msg1.reset = 1\n return [msg1, null]\n}\n\nif (counter > maxLoops)\n{\n context.set('counter', 0)\n \n msg1.payload = \"max loops reached, timer reset. condition value: \\\"\" + testCondition + \"\\\"\"\n msg1.reset = 1\n msg2.payload = \"doing something else now\"\n return [msg1, msg2]\n}\n\nif (testCondition)\n{\n msg1.payload = \"condition is \\\"true\\\", timer stopped\"\n msg1.reset = 1\n context.set('counter', 0)\n return [msg1, null]\n}\nif (testCondition === false)\n{\n counter ++\n context.set(\"counter\", counter)\n msg1.payload = \"condition is still \\\"false\\\", timer restarting. Loop count: \" + counter\n return [msg1, null]\n}\n","outputs":2,"noerr":0,"x":771.0086669921875,"y":431.001748085022,"wires":[["e5b421e6.b2cb2","ea3d9cb4.136"],["4ca7bb49.c55674"]]},{"id":"a99a85bb.3b76f8","type":"inject","z":"c02f1515.bbbc08","name":"","topic":"","payload":"true","payloadType":"bool","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":590.0070724487305,"y":573.7986698150635,"wires":[["31330cfe.a25964"]]},{"id":"6811f4eb.30a9cc","type":"inject","z":"c02f1515.bbbc08","name":"","topic":"","payload":"false","payloadType":"bool","repeat":"","crontab":"","once":true,"onceDelay":0.1,"x":581.0122051239014,"y":520.8039569854736,"wires":[["31330cfe.a25964","e5b421e6.b2cb2"]]},{"id":"27cb19cd.584566","type":"mqtt in","z":"c02f1515.bbbc08","name":"","topic":"checker/reset","qos":"2","datatype":"auto","broker":"a86bbd8c.6015c","x":371.01037979125977,"y":381.7710247039795,"wires":[["86ef62a5.495c2"]]},{"id":"31330cfe.a25964","type":"function","z":"c02f1515.bbbc08","name":"conditionSet","func":"condition = msg.payload\nflow.set('testCondition', condition)\n","outputs":1,"noerr":0,"x":779.0087738037109,"y":574.003475189209,"wires":[[]]},{"id":"e5b421e6.b2cb2","type":"delay","z":"c02f1515.bbbc08","name":"","pauseType":"delay","timeout":"120","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"x":784.007007598877,"y":519.9844360351562,"wires":[["b4c72232.fc463"]]},{"id":"86ef62a5.495c2","type":"change","z":"c02f1515.bbbc08","name":"","rules":[{"t":"set","p":"reset","pt":"msg","to":"1","tot":"num"}],"action":"","property":"","from":"","to":"","reg":false,"x":563.0138931274414,"y":382.3820285797119,"wires":[["b4c72232.fc463"]]},{"id":"d60cf76a.d8fe48","type":"inject","z":"c02f1515.bbbc08","name":"reset","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":207.0069694519043,"y":334.90973472595215,"wires":[["97d7618e.d1e0a"]]},{"id":"97d7618e.d1e0a","type":"mqtt out","z":"c02f1515.bbbc08","name":"","topic":"checker/reset","qos":"2","retain":"false","broker":"a86bbd8c.6015c","x":368.0121555328369,"y":335.7656364440918,"wires":[]},{"id":"ea3d9cb4.136","type":"debug","z":"c02f1515.bbbc08","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":994.0590896606445,"y":399.87859058380127,"wires":[]},{"id":"4ca7bb49.c55674","type":"debug","z":"c02f1515.bbbc08","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":994.0086669921875,"y":454.0017395019531,"wires":[]},{"id":"a86bbd8c.6015c","type":"mqtt-broker","z":"","name":"","broker":"localhost","port":"1883","clientid":"","usetls":false,"compatmode":true,"keepalive":"60","cleansession":true,"birthTopic":"","birthQos":"0","birthPayload":"","closeTopic":"","closeQos":"0","closePayload":"","willTopic":"","willQos":"0","willPayload":""}]
I am running the large flow on a raspberry pi. Could I be having some system limitation that loses control of the delay node? Is there a log somewhere that might give me insight.
In the interim I implemented a trigger node watchdog and that seems to be working but I actually have not witnessed the delay node dying yet. I would also think the trigger node could just as easily be killed by whatever is killing the delay node.
Thanks for any suggestions.