Delay node "quits/dies" prior to completing the timed delay

This is likely just user error, but I am not seeing why. I am pretty confident in my underlying code because it will work for several hours just fine before it has a problem.

Problem:
As part of a much larger flow I have a delay node section and I have watched the delay node just "quit" prior to reaching the end of its timer. It is not reset somehow because the "reset" indicator never appears. The little active blue square simply disappears and no message is sent. Has anyone ever experienced this? My timer is typically 120s or longer (but still only a few minutes, not hours). When I have set it to 30s, it is more reliable.

Below is an example flow that simulates the section of my much larger flow. Do you see any red flags? So far, I have not seen it have any problems.

[{"id":"b4c72232.fc463","type":"function","z":"c02f1515.bbbc08","name":"conditionChecker","func":"var msg1 = {}\nvar msg2 = {}\n\nmaxLoops = 5\ntestCondition = flow.get(\"testCondition\")\nvar counter = context.get('counter') || 0\n\nif (msg.reset) \n{\n    context.set('counter', 0)\n    \n    msg1.payload = \"timer reset. condition value: \\\"\" + testCondition + \"\\\"\"\n    msg1.reset = 1\n    return [msg1, null]\n}\n\nif (counter > maxLoops)\n{\n    context.set('counter', 0)\n    \n    msg1.payload = \"max loops reached, timer reset. condition value: \\\"\" + testCondition + \"\\\"\"\n    msg1.reset = 1\n    msg2.payload = \"doing something else now\"\n    return [msg1, msg2]\n}\n\nif (testCondition)\n{\n    msg1.payload = \"condition is \\\"true\\\", timer stopped\"\n    msg1.reset = 1\n    context.set('counter', 0)\n    return [msg1, null]\n}\nif (testCondition === false)\n{\n    counter ++\n    context.set(\"counter\", counter)\n    msg1.payload = \"condition is still \\\"false\\\", timer restarting.  Loop count: \" + counter\n    return [msg1, null]\n}\n","outputs":2,"noerr":0,"x":771.0086669921875,"y":431.001748085022,"wires":[["e5b421e6.b2cb2","ea3d9cb4.136"],["4ca7bb49.c55674"]]},{"id":"a99a85bb.3b76f8","type":"inject","z":"c02f1515.bbbc08","name":"","topic":"","payload":"true","payloadType":"bool","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":590.0070724487305,"y":573.7986698150635,"wires":[["31330cfe.a25964"]]},{"id":"6811f4eb.30a9cc","type":"inject","z":"c02f1515.bbbc08","name":"","topic":"","payload":"false","payloadType":"bool","repeat":"","crontab":"","once":true,"onceDelay":0.1,"x":581.0122051239014,"y":520.8039569854736,"wires":[["31330cfe.a25964","e5b421e6.b2cb2"]]},{"id":"27cb19cd.584566","type":"mqtt in","z":"c02f1515.bbbc08","name":"","topic":"checker/reset","qos":"2","datatype":"auto","broker":"a86bbd8c.6015c","x":371.01037979125977,"y":381.7710247039795,"wires":[["86ef62a5.495c2"]]},{"id":"31330cfe.a25964","type":"function","z":"c02f1515.bbbc08","name":"conditionSet","func":"condition = msg.payload\nflow.set('testCondition', condition)\n","outputs":1,"noerr":0,"x":779.0087738037109,"y":574.003475189209,"wires":[[]]},{"id":"e5b421e6.b2cb2","type":"delay","z":"c02f1515.bbbc08","name":"","pauseType":"delay","timeout":"120","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"x":784.007007598877,"y":519.9844360351562,"wires":[["b4c72232.fc463"]]},{"id":"86ef62a5.495c2","type":"change","z":"c02f1515.bbbc08","name":"","rules":[{"t":"set","p":"reset","pt":"msg","to":"1","tot":"num"}],"action":"","property":"","from":"","to":"","reg":false,"x":563.0138931274414,"y":382.3820285797119,"wires":[["b4c72232.fc463"]]},{"id":"d60cf76a.d8fe48","type":"inject","z":"c02f1515.bbbc08","name":"reset","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":207.0069694519043,"y":334.90973472595215,"wires":[["97d7618e.d1e0a"]]},{"id":"97d7618e.d1e0a","type":"mqtt out","z":"c02f1515.bbbc08","name":"","topic":"checker/reset","qos":"2","retain":"false","broker":"a86bbd8c.6015c","x":368.0121555328369,"y":335.7656364440918,"wires":[]},{"id":"ea3d9cb4.136","type":"debug","z":"c02f1515.bbbc08","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":994.0590896606445,"y":399.87859058380127,"wires":[]},{"id":"4ca7bb49.c55674","type":"debug","z":"c02f1515.bbbc08","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":994.0086669921875,"y":454.0017395019531,"wires":[]},{"id":"a86bbd8c.6015c","type":"mqtt-broker","z":"","name":"","broker":"localhost","port":"1883","clientid":"","usetls":false,"compatmode":true,"keepalive":"60","cleansession":true,"birthTopic":"","birthQos":"0","birthPayload":"","closeTopic":"","closeQos":"0","closePayload":"","willTopic":"","willQos":"0","willPayload":""}]

I am running the large flow on a raspberry pi. Could I be having some system limitation that loses control of the delay node? Is there a log somewhere that might give me insight.

In the interim I implemented a trigger node watchdog and that seems to be working but I actually have not witnessed the delay node dying yet. I would also think the trigger node could just as easily be killed by whatever is killing the delay node.

Thanks for any suggestions.

It may happen. If this is an issue it will depend on what you consider normal behavior.

What do you expect to happen if you click the first time on the false inject node and then quickly click again 5 times more? If you are expecting it to take 5 x 120 seconds to complete it won´t happen. The delay node has an internal buffer that handles each message individually. The first message will be delayed 120 seconds. The last message, that was clicked a split second after the first one, will be sent after 120 second + a fraction of a second. Unlike the trigger node the delay node does not have an option to extend the delay when a new message arrives.

@Andrei thanks for the info. In my case however no message is sent at all. For example the timer is initiated, and prior to it counting all the way down it just dies and no message is sent. Do you know of any reason this can happen?

Put a debug node directly on the delay to 100% verify this.

I can imagine a scenario where something similar could happen. In such a hypothetical scenario, the flow would start with the context variable counter already assigned to a value higher than zero. In such case your perception is that the timer is finishing before you expect. In the minimal flow you provided you are not explicitly setting this context variable at the initialization time. You set the variable: 1st inside the function node "condition checker" when the variable is undefined and 2nd at the times you also set msg.reset. In order to be in the safe side it is better to set it explicitly at the beginning of the flow (perhaps a mini flow with one inject node that triggers automatically + a change or function node).

@Andrei good point, I should add that to my flow. However, I actually see the timer start, but then it dies before it fully counts down.

@Steve-Mcl will do. It's difficult to track down because it only happens sometimes. I might have to setup some system where events are written to a file.

Is there anything in NodeRed where it might think the timer loop has gone into an infinite loop so it kills the timer to save the system? Just a thought.

Pretty sure there's not. (it's easy to prove / crash NR with a stack overflow) tho tbh I've not tried only what I've seen from others who have created a loop and crashed node-red

You are using context.set / context.get in your function node. If you are going to try what I said you should use instead flow context (flow.get, flow.set) instead of node context.

I am trying to reproduce the behavior you reported but no luck so far. I have the impression however that I got the count started at 3 (instead of zero) in one trial but not sure if I can trust what I have seen (I was doing lots of things at that point in time).

I understand now why I saw the counter starting at value 3 and think this is not an issue. I just misunderstood the concept of the flow.

I clicked the inject node (the one named true) to stop the flow and right after clicked the inject node (named false) to start the flow again.

This will not cause the counter to reset (as per the existing flow logic).

The function node (conditionChecker) will read the latest status of the flow variable testCondition regardless of how many times I clicked the inject node to stop. If the latest click was to continue counting the flow will carry on.

Check in /var/log/syslog (or equivalent for your OS) that node red is not crashing and restarting at the time this happens.
In addition if you tell the debug nodes to write to system log then can look back in the log to track what happened. Also in a function node, node.warn() will write to the log

@Colin thanks for the hints on the syslog and warn. I'll let the flows run for a while and see if they crash again.

Are you aware of any timer nodes that can survive a node-red redeploy? I tried a quite a few of the various timers but they all reset with a node-red redeploy.

@Andrei, I cannot seem to duplicate the timer "quitting" issue in my minimal example which makes me think I have a larger issue with node dying or some bad code somewhere. I'm going to keep an eye on the logs as @Colin suggests. I'll report back if I can learn something from the logs.

Can you explain what the overall requirement of this bit of the flow is?

1 Like

I ran the minimal flow quite a number of times and could not reproduce the issue. I added function nodes (with node.warn statement) and debug nodes everywhere. I measured the start and end time of each iteration of the loop and the delay node always handled perfectly (timely) the messages.

What bit of information in the debug node (or any other information) shows you that the timer dies?

When i run the flow it will display this message (counting up)

condition is still "false", timer restarting. Loop count: 1

until it reaches the max count and finally displays:

max loops reached, timer reset. condition value: "false"

doing something else now

What is different for you ?

I will see if i can post somewhere (not in the forum as it limits the file size) a short video that shows this behavior. Anyway, what is the behavior that you expected when the inject node true is clicked?

Stop the loop immediately?
Do you want to finish the current delay execution and then stop the flow?
This click should not stop the flow it serves another purpose?

Also if this node is clicked twice or three times in a row what you expect to happen ?

It boils down to a set of conditions that ultimately trigger the delay node. Once that delay node is triggered I really need it to send off a payload otherwise my flow will stop there and has to manually be resumed. Ideally I would like to have the flexibility to modify another section of the code and know that even if I redeploy while the delay node timer is “on”, my flow will continue.

Thank you very much for testing the minimal flow. The minimal flow is working perfectly for me also. I cannot get it to fail. Its purpose is only to put a timer into a loop and see if that timer ever fails. The true and false are meant to closely resemble inputs from my production flow but it's not a perfect example.

It seems to be an issue with my "production" flow. I'm sorry but my only feedback was actually visually watching the timer start (i.e. the blue square turns on) and then I saw the blue square disappear well before 120s had elapsed. I did not have a debug node on the timer, but the following exec node has a debug node and nothing was displayed.

I am trying to get some better data on this but so far no luck. You all seem to confirm however that there is nothing NodeRed will do to kill a timer other than if NodeRed actually redeploys or restarts.

... or gets a msg.reset - which is what anything arriving on the mqtt could do as it sets msg.reset.

It would be great if you could update this post when you find out something new.

May I ask you what version of Node-RED you are running? The issue of the erratic blue indicator seems to be specific for a beta release under testing and should not cause any collateral effect in your flow since it won't change the node timeout.

Hi Andrei,

Sorry to say I cannot get this behavior again. I have changed around my code a bit. I realized I could avoid the timer loop all together in one place. I still have one timer loop going and have yet to see the problem. It is possible as @dceejay suggests I have a msg.reset arriving that I wasn't seeing.

I am running Node-RED 0.20.3

1 Like

Possibly a reasonable way would be to have the timer in a separate flow - on a different tab perhaps. Then at least you can do a partial deploy and the time should survive.

Obviously, if you are doing a full deploy, you cannot guarantee that a timer will survive - though if I understand things correctly, it is likely that you could create a node that would allow a timer that could survive everything except a restart of Node-RED. The timer logic would have to be outside the module.exports part of the node. Not tried it though and I don't know if any of the current nodes do that.