I am trying to build a scalable rpa platform with Node-Red. Nod red is great for the orchestration for this. But I run into a small bump, where I can not find any solution yet so I want to ask you for inspiration/feedback.
I am using the docker image of node red in docker swarm so I can easy scale up and down.
With scaling down I run into a problem. When a container is shut down the flows that are still busy are lost and so is the data in transition. Within RPA we try to interact with systems where the two way commit solution will not help us. So the best way would be a graceful shutdown of the node red instance.
The consuming/start nodes that I have in scope are amqp and http.
Does anyone have an idea how to shutdown a node red container gracefully so all the flows will be done before shutting down?
My fallback scenario is to shutdown consumption on the mq layer for example, wait for max time of processing and then scale and start consuming again. But this is far from ideal.
Without proper runtime support a workaround is nearly impossible or at least error prone and complex. You don't know the internal state of each node.
Some of these issues can be solved with the new messaging API and the complete node. But the nodes you are using have to implement the new API. But that information gets you nowhere, because you can't delay the shutdown of the runtime from your flow when it's triggered by a SIGINT.
All in all, that topic is not as trivial as it seems, because you have to take many runtime factors into account, like in-flight messages, internal node state (queues, delays), etc...
I came here to ask the same question: is there a way to stop Node-RED using an internal API call, something like RED.stop() or similar.
Reading that I realised that it's not that simple.
Having said that though, I regularly use service nodered stop which is based on the systemd integration that comes with Node-RED. (I created a flow that uses an exec node to issue that command.)
This would imply that there is a clear way to stop Node-RED without causing too much damage - the question is how does service nodered stop know what to do?
So it would seem to me that it would be possible to have an internal API for doing graceful shutdown - what is missing?
It sends the appropriate signal to node red to tell it to shutdown. If you are running node-red via systemd and you kill it by means other than systemd, then systemd will see that it has stopped and will start it again.
only works if using systemd. If node-red is started on the command line without any start scripts, then kill -INT ${PPID} does the same job. I have three usecases: node-red started in docker image, node-red started via systemd/service/init.d script or node-red simply started on the command line. Each use case has its own handling.
Correct, I thought we were specifically talking about systemd, but looking back I see I was wrong.
If using systemd with auto restart configured then I think you will have to use systemd to stop it, otherwise it will restart. Unless you want a restart of course.
The underlying question is whether using kill -INT ... is a graceful shutdown? That was the original question four years ago and I wonder whether the Node-RED internals does in fact do a graceful shutdown when sent an SIGINT?
That I don't know how graceful it happens to be ...
But if it happens to be graceful, then there could be an internal API for shutting Node-RED down, e.g., RED.stop(), since there then must be a mechanism for a graceful shutdown.
But that is all speculation and I have find the solution I need by using kill -INT.
From the doco: Set the runtime state of flows. Note that runtime state of flows is available only if runtimeState value is set to enabled: true in the settings.js file.
So that only affects the state of flows, i.e., I could stop all flows but Node-RED would still be running using that API.
In fact I think you should use SIGTERM to cleanly terminate a process. You can easily check by watching the node-red log, you will see there whether it shuts down cleanly.
SIGINT came from the systemd script, so I assume that should be the signal. Both INT and TERM can be captured by the underlying process, so I assume that's the right signal.
In the code, there is a stop() but its not exposed to nodes.
I searched for SIGINT in the codebase and only found this PR:
Proposed changes
On shutting down / restarting Node-RED via nodemon it was noticed that the context cache was not flushed to cache - as the RED.stop() was never called. This PR adds listeners to various other SIGNALs to try to catch the signals that are used for a safe shutdown to call the cache flush operation before exit.
They are
SIGINT (already handled)
SIGTERM - similar to SIGINT
SIGHUP - often used to restart linux processes
SIGUSR2 - used by nodemon to restart
SIGBREAK - used by windows ctrl-break key
and on ("message".. of shutdown - from PM2 under windows with the --shutdown-with-message flag set.
So it would seem that SIGINT is the only signal that is handled properly.
EDIT: that PR is in fact merged and the code is actually here - spoiler: TERM INT HUP BREAK USR2 all do the same thing ....
In a brief test on a Raspberry Pi:
SIGHUP (kill -1), SIGINT (kill -2), SIGUSR2 (kill -12) and SIGTERM (kill -15)
kill the process gracefully enough that systemctl status nodered will show "stopping flows", "mqtt broker disconnected and "stopped flows"
SIGUSR1 (kill -10) is ignored
SIGQUIT (kill -3) and SIGKILL of course (kill -9) kill the process abruptly.