[Tester Wanted]: Graceful shutdown of flows

GogoVega · 10 November 2025 09:23

Hi all,

I'm reaching out to you today to gather feedback on a potential new feature in NR v5.

This new feature is graceful shutdown of flows – providing a way to halt incoming work while allowing in-progress work to complete before fully stopping.

Imagine your flow contains an http in => function => http response workflow, and you decide to deploy a change. If a request arrives while the flow is restarting, this request will be lost, and the client will receive a 404 error.

The graceful shutdown feature would allow NR to send the response to the client before closing nodes.

More specifically, graceful shutdown is a three-step shutdown sequence:

Close all message initiator nodes. These nodes are defined by the user in the UI.
Wait until the list of messages in progress is empty (or until the timeout is reached).
Close remaining nodes.

Each flow/subflow has its own tab for individual configuration. This window contains:

timeout
a list of message initiator nodes
a checkbox for failFast

The failFast option allows you to cancel the graceful shutdown of other flows linked to its closure. See shutdown scope in the linked PR.

This sequence works the same way for a restart and all three types of deployment (full/flow/node).

You can test this new feature yourself by:

cloning the repository: git clone https://github.com/GogoVega/node-red.git
checking out the branch: git checkout 2296-graceful-shutdown
installing dependencies: npm i
building NR: npm run build
and finally running it: npm start

The linked PR contains a test flow to help you discover

NOTE: Nick has not yet approved details of its internal workings.

Help me introduce this new feature by testing it, providing feedback, and suggesting improvements.

Big thanks

And thanks to its creator Kunihiko Toumura

github.com/node-red/node-red

Introduce `Graceful Shutdown`

dev ← GogoVega:2296-graceful-shutdown

ouvert 05:00PM - 09 Nov 25 UTC

GogoVega

+794 -135

## Types of changes - [ ] Bugfix (non-breaking change which fixes an issue) …- [x] New feature (non-breaking change which adds functionality) ## Proposed changes Introduce a graceful period in shutdown process to prevent interrupting in-progress processes. Rework of the excellent work done in #2296 by @k-toumura. This PR contains an updated implementation and a more extensive concept. See key differences. ### Definitions - `Message Initiator Nodes`: any node that triggers the sending of a message outside of the processing of an incoming message. - `Shutdown Scope`: any flow capable of receiving a message through the link and subflow nodes for the same workflow. So, if flow A contains an `inject` node and a `link out` node, flow B contains a `link in` node and a `link out` node, and flow C contains a `link in` node and a `debug` node, the scope will be A, B, and C because the workflow will traverse all three flows. - `failFast`: cancels graceful shutdown of other flows in the scope. In other words, it forces the shutdown of all nodes of other flows in the scope. ### Concept Currently, when a flow closes, all nodes are closed in a single-step sequence. With *Graceful Shutdown*, there are **three** steps in the shutdown sequence: 1. Close all message initiator nodes 2. Wait in-progress messages 3. Close remaining nodes #### Message initiator nodes These nodes are defined by the user in the UI. Each flow and subflow contains a new pane that allows the user to choose these nodes. <img width="506" height="671" src="https://github.com/user-attachments/assets/0b49df43-0892-40e7-ac12-82974bca2f5c" /> >[!WARNING] > We assume that a message initiator node must have an output and cannot be a link node (it transmits a message, it does not create one). #### In-progress messages Each flow contains a message counter. This counter is incremented when a node sends a message and decremented when a node receives a message. Step 2 will end when the counter reaches 0 or when the timeout is exceeded. The counter works for a single flow as follows: ![single-flow-shutdown](https://github.com/node-red-hitachi/designs/raw/graceful-shutdown/designs/graceful-shutdown/sequence.svg) If the scope is the flow itself, the counter is that of the flow. Otherwise, the counter is the sum of the counters that make up the scope. The counter works for a multiple flows as follows: ![multiple-flows-shutdown](https://github.com/user-attachments/assets/61d6f2fd-5e47-435c-8f5a-ceaa378b389f) Each flow can have its own timeout, but the flow's configuration will affect this. If a flow simply forwards the message to the next flow, it may be wanted to close it once the message has been sent. #### `failFast` If a flow fails to close in time (timeout exceeded), it can force the shutdown of other flows in the scope that are in shutdown phase. #### flow/node change deployment The sequence is identical, but the calculation in step 2 is based on "is one of the nodes I'm stopping waiting for a message?" . We can't use the counter here because other nodes are active. Therefore, we must try to find a point at which nodes we're stopping are no longer receiving messages. ### Key differences - Support for messages crossing a flow (link or subflow node) - Simulated Node Messaging API - Timeout per flow - Cancel other flow if this one fails - Support for flow/node change deployment ### Limitations - An asynchronous job must call `done()` itself, otherwise the counter may be inaccurate (offset). e.g. `function` node. - Any loop, whatever its nature, is unmanageable. ### Related links - Originel design: https://github.com/node-red-hitachi/designs/blob/graceful-shutdown/designs/graceful-shutdown/README.md - Originel implementation: https://github.com/node-red/node-red/pull/2296 ### Test flow ```json [{"id":"b8f08e1162331d0a","type":"subflow","name":"Subflow 3","info":"","category":"","in":[],"out":[{"x":480,"y":100,"wires":[{"id":"57d37452c847dd05","port":0},{"id":"2d54d91583af6c6e","port":0}]}],"env":[],"failFast":false,"timeout":10000,"initiatorNodes":["5a16348a0fb755d2"],"meta":{},"color":"#DDAA99"},{"id":"9a3fae6bcb73e105","type":"debug","z":"b8f08e1162331d0a","name":"debug 10","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":340,"y":160,"wires":[]},{"id":"5a16348a0fb755d2","type":"inject","z":"b8f08e1162331d0a","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":true,"onceDelay":"1","topic":"","payload":"","payloadType":"date","x":150,"y":100,"wires":[["9a3fae6bcb73e105","57d37452c847dd05"]]},{"id":"57d37452c847dd05","type":"delay","z":"b8f08e1162331d0a","name":"","pauseType":"delay","timeout":"1","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"allowrate":false,"outputs":1,"x":340,"y":100,"wires":[[]]},{"id":"2d54d91583af6c6e","type":"http in","z":"b8f08e1162331d0a","name":"","url":"/test","method":"get","upload":false,"skipBodyParsing":false,"swaggerDoc":"","x":340,"y":60,"wires":[[]]},{"id":"52fbdebd011809bf","type":"subflow","name":"Subflow 2","info":"","in":[{"x":60,"y":40,"wires":[{"id":"914b123ba249824b"}]}],"out":[]},{"id":"7e6b5f724f3828d3","type":"debug","z":"52fbdebd011809bf","name":"debug 5","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":340,"y":40,"wires":[]},{"id":"4caf849443ebc04e","type":"debug","z":"52fbdebd011809bf","name":"debug 6","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":340,"y":100,"wires":[]},{"id":"914b123ba249824b","type":"delay","z":"52fbdebd011809bf","name":"","pauseType":"delay","timeout":"5","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"allowrate":false,"outputs":1,"x":180,"y":40,"wires":[["7e6b5f724f3828d3","4caf849443ebc04e"]]},{"id":"5025fc31b2f76672","type":"subflow","name":"Subflow 1","info":"","category":"","in":[{"x":80,"y":60,"wires":[{"id":"a24214ee0347c2a3"}]}],"out":[{"x":320,"y":60,"wires":[{"id":"a24214ee0347c2a3","port":0}]}],"env":[],"failFast":false,"timeout":10000,"initiatorNodes":["1930e055ab59b657"],"meta":{},"color":"#DDAA99"},{"id":"a24214ee0347c2a3","type":"delay","z":"5025fc31b2f76672","name":"","pauseType":"delay","timeout":"5","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"allowrate":false,"outputs":1,"x":200,"y":60,"wires":[[]]},{"id":"48f3cacf88806e9b","type":"tab","label":"Flow 1","disabled":false,"info":"","env":[],"failFast":false,"timeout":6000},{"id":"fcfe154798ae59b2","type":"debug","z":"48f3cacf88806e9b","name":"debug 2","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":680,"y":140,"wires":[]},{"id":"027c18126a348a83","type":"link out","z":"48f3cacf88806e9b","name":"link out flow 1","mode":"link","links":["9ddc5b8b594ea056"],"x":435,"y":220,"wires":[]},{"id":"9539e95a4aa3f3a2","type":"function","z":"48f3cacf88806e9b","name":"Wait 5s","func":"setTimeout(() => {\n node.send(msg);\n // MUST be defined!!!\n node.done();\n}, 5000);\n","outputs":1,"timeout":0,"noerr":0,"initialize":"","finalize":"","libs":[],"x":320,"y":60,"wires":[["8649a17f0f1ac737"]]},{"id":"9cae538fddad00ef","type":"inject","z":"48f3cacf88806e9b","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":160,"y":60,"wires":[["9539e95a4aa3f3a2"]]},{"id":"8649a17f0f1ac737","type":"debug","z":"48f3cacf88806e9b","name":"debug 1","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","statusVal":"","statusType":"auto","x":480,"y":60,"wires":[]},{"id":"3dc426ba800918cf","type":"function","z":"48f3cacf88806e9b","name":"Send 3 msgs","func":"\nreturn [msg, msg, msg];","outputs":3,"timeout":0,"noerr":0,"initialize":"","finalize":"","libs":[],"x":490,"y":140,"wires":[["fcfe154798ae59b2"],["fcfe154798ae59b2"],["fcfe154798ae59b2"]]},{"id":"1e884590101d0f1d","type":"delay","z":"48f3cacf88806e9b","name":"","pauseType":"delay","timeout":"5","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"allowrate":false,"outputs":1,"x":320,"y":140,"wires":[["3dc426ba800918cf"]]},{"id":"3c33837e00ce38f1","type":"inject","z":"48f3cacf88806e9b","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":160,"y":140,"wires":[["1e884590101d0f1d"]]},{"id":"767740c89774fc4b","type":"delay","z":"48f3cacf88806e9b","name":"","pauseType":"delay","timeout":"5","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"allowrate":false,"outputs":1,"x":320,"y":220,"wires":[["027c18126a348a83"]]},{"id":"d85477921ee620dd","type":"inject","z":"48f3cacf88806e9b","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":160,"y":220,"wires":[["767740c89774fc4b"]]},{"id":"309e1785f9eee23e","type":"link out","z":"48f3cacf88806e9b","name":"link out flow 1 (1)","mode":"link","links":["8391252e5b710669"],"x":435,"y":280,"wires":[]},{"id":"ecdde9e76693e516","type":"delay","z":"48f3cacf88806e9b","name":"","pauseType":"delay","timeout":"5","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"allowrate":false,"outputs":1,"x":320,"y":280,"wires":[["309e1785f9eee23e"]]},{"id":"8703afd442944623","type":"inject","z":"48f3cacf88806e9b","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":160,"y":280,"wires":[["ecdde9e76693e516"]]},{"id":"fde8bb0eb5d93e19","type":"inject","z":"48f3cacf88806e9b","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":160,"y":380,"wires":[["19bdf1fe6fa090bf"]]},{"id":"19bdf1fe6fa090bf","type":"subflow:5025fc31b2f76672","z":"48f3cacf88806e9b","name":"","x":320,"y":380,"wires":[["894a802e1adfd39b"]]},{"id":"894a802e1adfd39b","type":"debug","z":"48f3cacf88806e9b","name":"debug 5","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":480,"y":380,"wires":[]},{"id":"ee1a92bf8822ff1b","type":"inject","z":"48f3cacf88806e9b","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":160,"y":440,"wires":[["27511002b62ff11f"]]},{"id":"27511002b62ff11f","type":"subflow:52fbdebd011809bf","z":"48f3cacf88806e9b","name":"","x":320,"y":440,"wires":[]},{"id":"b42e930f2ec522a9","type":"subflow:b8f08e1162331d0a","z":"48f3cacf88806e9b","name":"","x":160,"y":500,"wires":[["105f660a67b234f7"]]},{"id":"e1b0701f42f10808","type":"debug","z":"48f3cacf88806e9b","name":"debug 7","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":480,"y":500,"wires":[]},{"id":"105f660a67b234f7","type":"delay","z":"48f3cacf88806e9b","name":"","pauseType":"delay","timeout":"2","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"allowrate":false,"outputs":1,"x":320,"y":500,"wires":[["e1b0701f42f10808"]]},{"id":"a951db0574f9b04f","type":"tab","label":"Flow 2","disabled":false,"info":"","env":[]},{"id":"9ddc5b8b594ea056","type":"link in","z":"a951db0574f9b04f","name":"link in flow 2","links":["027c18126a348a83"],"x":165,"y":60,"wires":[["8965b65b9e976c82"]]},{"id":"8965b65b9e976c82","type":"debug","z":"a951db0574f9b04f","name":"debug 3","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":300,"y":60,"wires":[]},{"id":"8391252e5b710669","type":"link in","z":"a951db0574f9b04f","name":"link in flow 2 (1)","links":["309e1785f9eee23e"],"x":165,"y":120,"wires":[["7c6b9fee0e4338c0"]]},{"id":"7c6b9fee0e4338c0","type":"link out","z":"a951db0574f9b04f","name":"link out flow 2 (1)","mode":"link","links":["f6553a234c2cbd1c"],"x":255,"y":120,"wires":[]},{"id":"38d0de50ec78556c","type":"tab","label":"Flow 3","disabled":false,"info":"","env":[],"failFast":false,"timeout":10000,"initiatorNodes":[]},{"id":"f6553a234c2cbd1c","type":"link in","z":"38d0de50ec78556c","name":"link in 1","links":["7c6b9fee0e4338c0"],"x":165,"y":60,"wires":[["76627d74979e5144"]]},{"id":"76627d74979e5144","type":"debug","z":"38d0de50ec78556c","name":"debug 4","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":300,"y":60,"wires":[]}] ``` ### Todo list - [ ] Add unit tests 😭 - [ ] TODO comments - [ ] Improve the UI + translation - [ ] Improve this comment, but for now my head hurts too much. ## Checklist - [x] I have read the [contribution guidelines](https://github.com/node-red/node-red/blob/master/CONTRIBUTING.md) - [x] For non-bugfix PRs, I have discussed this change on the forum/slack team. - [x] I have run `npm run test` to verify the unit tests pass - [ ] I have added suitable unit tests to cover the new/changed functionality

GogoVega · 10 November 2025 15:08

setup-node-red.sh.txt (10,8 Ko)

Download the file
Open your terminal
Go to downloads directory: cd Downloads
Run the file: bash ./setup-node-red.sh.txt
Enjoy

GogoVega · 10 November 2025 16:11

@gregorius, can your Erlang-Red support this feature?

It would provide a quick way to test it. To prevent users from installing it if they are not familiar with it.

Huge thanks

dceejay · 10 November 2025 16:17

Can you explain the http scenario again please ? To my mind there are two conditions

A message has arrived and is in the flow. Today we may close the reply node before the reply is sent. The client gets a broken connection timeout not a 404. With your fix the reply should get sent.
A message arrives during closing. Again today the client will get a broken connection. But even with your fix the input node would be shut so the server would be unavaiable so would get a not found timeout error not a 404. So I’m not sure they are any better off ? What am I missing ?

gregorius · 10 November 2025 16:33

You mean a graceful shutting down of flows - not really although I do have a clear "stop" that I send to nodes when the flow stops - to handle socket closures, dropping connections, file closures, etc gracefully.

I won't consider entering the world of message tracing in Erlang, it's just not meant that way. Erlang assumes that failure is normal and well defined recovery is more important. Therefore if something does go wrong, then let it - that's the basic Erlang mantra.

I personally think the overhead is too much for the gain. In a highly concurrent setup (such as Erlang-Red), it would be very difficult to define a correct order of shutdown. Each process has it's own needs and therefore it's own dependencies. Having said that, Erlang itself has a notion of "linking" so that processes can link themselves together. Then if one goes down, it takes its parent with it (or the parent is sent a message). In this way a process dependency can be defined by the nodes themselves.

A supervisor node can be used to visually coordinate that - so that the user can define their own process dependencies as part of the flow definition. The supervisor - as in a factory - is responsible for worker processes. If a worker process does fail, the supervisor will restart the child process - if so configured.

EDIT: So Erlang being very much different to NodeJS, you can achieve a "graceful" shut down by using enough supervisor nodes, i.e. a supervisor tree. Supervisors can in fact also supervise other supervisors. That's in fact how Erlang coordinates many 1000's of processes - with supervisor trees.

GogoVega · 10 November 2025 16:40

I made a slight mistake in my explanation. I wanted to give a more telling example... I recall a previous thread discussing an incorrect error code because the endpoint had been removed.

If the connection closes between the HTTP in and HTTP response nodes, my functionality will allow the response to be send before the HTTP response node closes. This avoid a timeout. If the HTTP in node is already closed, then a request will indeed fail.

No worries Gerrit

dceejay · 10 November 2025 17:01

so a more extreme example - say it takes 5 secs to handle the http request (I did say it was extreme ) - on a restart currently the flow will break any requests "in-flight" - but will be back up and running almost instantly. With clean shutdown it will handle those inflight but will be offline for any new requests for 5 seconds. Again I'm not sure which is better or worse - I'm sure there are cases for each.

GogoVega · 10 November 2025 17:07

The user can also choose not to close the HTTP in node; it will close once there are no more pending messages.

GogoVega · 14 November 2025 10:18

I just thought of introducing a fourth step to the sequence.

This new step would be the second, and its role would be to trigger a shutdown workflow.

Let's take the dashboard as an example. The workflow is as follows: an MQTT node receives data from a sensor, transmits it to the context and to the dashboard too. The context is local, so it doesn't save data to a file.

When the shutdown occurs, a shutdown node will trigger a workflow to save the context to a file. The advantage here is avoiding the need to regularly update/flush the context file.

We can also imagine a role for this in industry: when the 'shutdown' node triggers, it cleanly shuts down the robot/process.

This new step is based on another concept I discussed in the past regarding the introduction of sequential flows: a startup flow, a work flow, and a shutdown flow.

What do you think of this fourth step?

dceejay · 14 November 2025 18:03

I thought context already got flushed on close

Paul-Reed · 16 November 2025 19:44

Yes, this was discussed a few years ago…

Edit - 5 years ago

Topic		Replies	Views
Gracefully stop node-red General	19	980	22 August 2024
[Feedback Wanted] Flows Sequence Feature Requests	42	448	5 March 2024
Custom node subscribed to flow restart event General	23	795	24 April 2025
Cmd Node-Red auto shutdown General	2	435	23 March 2020
A new type of flow General	39	578	25 December 2024

[Tester Wanted]: Graceful shutdown of flows

Related topics