[Tester Wanted]: Graceful shutdown of flows

Hi all,

I'm reaching out to you today to gather feedback on a potential new feature in NR v5.

This new feature is graceful shutdown of flows – providing a way to halt incoming work while allowing in-progress work to complete before fully stopping.

Imagine your flow contains an http in => function => http response workflow, and you decide to deploy a change. If a request arrives while the flow is restarting, this request will be lost, and the client will receive a 404 error.

The graceful shutdown feature would allow NR to send the response to the client before closing nodes.

More specifically, graceful shutdown is a three-step shutdown sequence:

  1. Close all message initiator nodes. These nodes are defined by the user in the UI.

  2. Wait until the list of messages in progress is empty (or until the timeout is reached).

  3. Close remaining nodes.

Each flow/subflow has its own tab for individual configuration. This window contains:

  • timeout
  • a list of message initiator nodes
  • a checkbox for failFast

The failFast option allows you to cancel the graceful shutdown of other flows linked to its closure. See shutdown scope in the linked PR.

This sequence works the same way for a restart and all three types of deployment (full/flow/node).

You can test this new feature yourself by:

  • cloning the repository: git clone https://github.com/GogoVega/node-red.git
  • checking out the branch: git checkout 2296-graceful-shutdown
  • installing dependencies: npm i
  • building NR: npm run build
  • and finally running it: npm start

The linked PR contains a test flow to help you discover :slightly_smiling_face:

NOTE: Nick has not yet approved details of its internal workings.

Help me introduce this new feature by testing it, providing feedback, and suggesting improvements.

Big thanks :heart_eyes:

And thanks to its creator Kunihiko Toumura :face_blowing_a_kiss:

1 Like

setup-node-red.sh.txt (10,8 Ko)

  1. Download the file
  2. Open your terminal
  3. Go to downloads directory: cd Downloads
  4. Run the file: bash ./setup-node-red.sh.txt
  5. Enjoy :winking_face_with_tongue:
1 Like

@gregorius, can your Erlang-Red support this feature?

It would provide a quick way to test it. To prevent users from installing it if they are not familiar with it.

Huge thanks :heart:

1 Like

Can you explain the http scenario again please ? To my mind there are two conditions

  1. A message has arrived and is in the flow. Today we may close the reply node before the reply is sent. The client gets a broken connection timeout not a 404. With your fix the reply should get sent.
  2. A message arrives during closing. Again today the client will get a broken connection. But even with your fix the input node would be shut so the server would be unavaiable so would get a not found timeout error not a 404. So I’m not sure they are any better off ? What am I missing ?

You mean a graceful shutting down of flows - not really although I do have a clear "stop" that I send to nodes when the flow stops - to handle socket closures, dropping connections, file closures, etc gracefully.

I won't consider entering the world of message tracing in Erlang, it's just not meant that way. Erlang assumes that failure is normal and well defined recovery is more important. Therefore if something does go wrong, then let it - that's the basic Erlang mantra.

I personally think the overhead is too much for the gain. In a highly concurrent setup (such as Erlang-Red), it would be very difficult to define a correct order of shutdown. Each process has it's own needs and therefore it's own dependencies. Having said that, Erlang itself has a notion of "linking" so that processes can link themselves together. Then if one goes down, it takes its parent with it (or the parent is sent a message). In this way a process dependency can be defined by the nodes themselves.

A supervisor node can be used to visually coordinate that - so that the user can define their own process dependencies as part of the flow definition. The supervisor - as in a factory - is responsible for worker processes. If a worker process does fail, the supervisor will restart the child process - if so configured.

EDIT: So Erlang being very much different to NodeJS, you can achieve a "graceful" shut down by using enough supervisor nodes, i.e. a supervisor tree. Supervisors can in fact also supervise other supervisors. That's in fact how Erlang coordinates many 1000's of processes - with supervisor trees.

1 Like

I made a slight mistake in my explanation. I wanted to give a more telling example... I recall a previous thread discussing an incorrect error code because the endpoint had been removed.

If the connection closes between the HTTP in and HTTP response nodes, my functionality will allow the response to be send before the HTTP response node closes. This avoid a timeout. If the HTTP in node is already closed, then a request will indeed fail.

No worries Gerrit :slightly_smiling_face:

1 Like

so a more extreme example - say it takes 5 secs to handle the http request (I did say it was extreme :slight_smile: ) - on a restart currently the flow will break any requests "in-flight" - but will be back up and running almost instantly. With clean shutdown it will handle those inflight but will be offline for any new requests for 5 seconds. Again I'm not sure which is better or worse - I'm sure there are cases for each.

The user can also choose not to close the HTTP in node; it will close once there are no more pending messages.

2 Likes