Custom node - waiting for instance to be ready

I've stumbled across a scenario that I think should be common but reading the docs on custom nodes I'm not seeing any obvious discussion.

Imagine I am creating a new node type ... let us call it XXX. To make this work, I setup my new node code with:

RED.nodes.registerType("XXX", XXXNode);

I must now provide a function called XXXNode that will be called when a new instance of the node is to be created:

function XXXNode(config) {
   RED.nodes.createNode(this, config);
   const node = this;
   node.on('input', (msg) => {
      // Do something with the incoming message
   });
}

So far ... all is well and going to plan. Now comes the puzzle.

Let us imagine that the creation of a new XXXNode instance requires some asynchronous initialization. An XXXNode instance must call a remote async service and receive an async response before it can be allowed to run. This means that my flow should not be considered ready until the XXXNode has received its asynchronous callback.

What I can't seem to see is a protocol/callback/Promise or other indication that the creation of my XXXNode could use to indicate that it is ready for work. What is actually happening is that when I start my flows, if the flow receives an incoming message before the XXXNode is ready, it breaks. What I think I want to do is have the deployment of the flow not indicate that it is "ready" until it is actually ready.

My gut is saying that the Node instance constructor should be returning a Promise or being passed in a done() function but none of these seem to be present.

Does anyone have any guidance for me on solving such a puzzle in the Node-RED world?

1 Like

you need to not handle any msgs until you are ready... up to you if you drop them or queue them or whatever. What would Node-RED do if your node never came ready.. - either we would wait forever - thus completely hung... or start anyway... in which case we may as well do that right away.

1 Like

Thanks @dceejay. This puzzle came front and center for me today when I was writing some tests using node-red-node-test-helper. I had coded:

helper.load(..., () => {
  const n1 = helper.getNode("n1");
  n1.receive({payload: "data"});
});

During human testing, my node was working great ... but when building my unit tests, the tests were failing. It took me a long time to get to the point where I seem to have found that using the helper, the Node-RED internal run-time is started and then immediately passed control to my code to perform some tests by sending a test message. At human time scale, my custom node has initialized (received its async callbacks) but at machine speed, we have the race condition that is the subject of this thread.

Hey Neil (@kolban),

Cannot describe it any better as @dceejay has done above.
Most of the time I indeed skip all input messages, as long as my node is not up-and-running.
For example here for my Onvif camera nodes. Summarized:

  • When the node is started: I set the node status to "connecting" and I start asynchronous connection to my camera. This way users can see in their flow editor that the node is currently "connecting" ...
  • As long as the status is not equal to "connected", I skip all input messages.
  • As soon as the asynchronous handler has set the node status to "connected", you will see this status in the flow editor.
  • From now on all input messages will be processed

I've been contemplating the architecture of a flow being "ready" for work. In my own custom node, I now queue incoming requests until the dependency of my node has asynchronously acknowledged its availability. This is based on the great thinking of mr @dceejay.

I am now thinking of Node-RED in a clustered environment. I am imagining running Node-RED as a "cloud native" service handling incoming requests. A loose example would be a Kubernetes Pod consisting of multiple containers of Node-RED. This pod would have a single "exposed" service interface and the pod would balance the work across its containers. This is not an uncommon pattern.

Now let us now further consider the nature of a pod ... it treats its containers as volatile and they can be started and stopped as needed. If a pod scales up and adds a new container the services provided by that container then become visible as another transparent worker of the pod itself. Again, this is how I'm interpreting the cloud native stories. It is in this context that I'm musing over Node-RED flow startup.

If a Node-RED server instance is spun up in a pod then I am sensing that the instance is believed to be immediately available. The state of the Node-RED server is either "running" or "down". There is no other state such as "starting" or "initializing". Since the pod sees that the container is "running", it will include it in the mix for sending work to it. If the new container is still "starting" (Node-RED is running but nodes to run flows are not yet ready) then the pod may send work to that instance in preference to perhaps an alternate instance that is already "running". It is here that I am noodling on the story.

And this is about as far as my thinking has taken me. I'm not yet sure if I'm tilting at windmills and seeing a problem that isn't actually there; I'm also not yet sure if I'm over thinking the purpose of Node-RED (I'm still new here). No-one can argue that Node-RED does what it does fantastically well but it may also be that Node-RED isn't designed to be run in a production/time sensitive manner as I'm contemplating (I honestly don't know ... as a sandbox or hobbyist environment ... no question at all ... but I have no info on production usage yet).

Backstory: Am reading this book on Cloud Native Infrastructure:

and read:

A good example is when the platform needs to know when the application is available to receive traffic. While the application is starting, it cannot properly handle traffic, and it should present itself as not ready. This additional state will prevent the application from being terminated prematurely, because if health checks fail, the platform may assume the application is not healthy and stop or restart it repeatedly.

... and this caused me to think about this thread again.

1 Like

Sorry to revive this thread. I was about to start my own, because I ran into the same challenges in the past as the OP. I used similar workarounds to this problem, but those solutions aren't clean. :sweat_smile:

However, I recently found this pull request, where a proper solution to this would be required. This caught my attention again.

In my opinion, proper lifecycle management of the nodes is the responsibility of the runtime. All control over nodes and their status resides there. That includes their initialization to signal their readiness.

Of course, what @dceejay mentioned needs consideration, but I think we can find solutions to this in discussion. E.g. disable the node and all preceding nodes to prevent input, just thinking...

There have been many lifecycle improvements, like the callback in on('close') and the new on('input') signature, all required for a graceful shutdown.

A graceful startup is the last missing piece for a clean lifecycle management. :slightly_smiling_face: