Workload limitations

We have an application where we have no control over the number of messages arriving, so we may get an overload in the message flow. So, there are some questions:

  • How can we determine the current workload of Node-RED?
  • Is there any indication like: you have reached 50% of the maximum workload???
  • What happens, if we get overload in the message pipe? Will we lose some messages?
  • Is there a way to handle an overload properly and only, if it happens? (without a delay node).

Thank you for advice

You could queue the messages.

This is done with the delay node set to queue messages.

But Node-Red is asynchronous.

Something else you could do is get the node-red-contrib-msg-speed node.
Put a switch node after it and only pass messages if the value is ..... greater than a value.
Send that then to something that would indicate to you you are getting a lot of messages.

It will probably depend on how these messages arriving -- for instance, if they are http requests, there will be a limit to how many connections the express server will support. MQTT should queue up as many messages as the system resources will support. Serial connections just stream in realtime...

But it sounds like you may be more worried about how many msg objects are "in-flight" in your node-red flows. The limiting factor will be how many objects can be held in the runtime memory heap, which depends upon the size of each object. Then there is the question of what you are doing with each msg object -- for example, posting them to a slow rest api could be 1000x slower than appending to a local file or writing to a database.

We have been using some fairly simple flows to migrate millions of records and files out of an old application into local files and database records. The goal was to set a target number of msgs in-flight, to optimize throughput, while not overloading the system. We did not bother to use system resources to monitor the system resources -- instead, we used the node-red-contrib-semaphore node configured to allow the target number of "tickets" available for the flow to use. Each incoming message tries to "take" a ticket, and either continues immediately or waits until one is available. After each msg is processed, it "leaves" the ticket for the next waiting message to use. It's a very efficient way to manage throughput, but you have to be certain that your flow cannot fail to return the ticket, or else it could become deadlocked.

In my experience, you don't overload the node-red runtime -- just the downstream services where that data is stored. If you truly need to handle large bursts of activity, you may want to look some way to "buffer" the data ahead of your node-red flows (temp files or db records, message queues, etc). The more info you can provide for your actual data sizes and communication protocols, the better this community can provide some useful suggestions.

1 Like

Hy,

thank you for your brilliant answer.

As I do not know how the runtime system below node red really works, it is hard to imagine, what precisely happens under the hood. So IĀ“m looking for better understanding and a general approach to load management in node red. I assume the limiting factors may be very different running on a raspberry 1 with a slow CPU and low memory or on a cloud server with a slow connection over the internet.

The semaphore sounds like a good Idea, but as you mentioned, It may cause deadlocks. If we had information about the limiting factors, we could do a "the next free node is reserved for you" pipe as an input throttle without this limitation. But IĀ“m really unsure what kind of factor we should look at. Free memory does not seem to be a good idea, as some applications like influx grab all the memory they can get.

Is there any formation about the "active" messages in the system (?). Let assume, you use a REST api on both ends of your flow (like we frequently do). On the incoming side, I assume there is some input buffer in the HTTP system, so messages will be buffered if the system is busy. But on the output side, there is a node that sends a http request and waitĀ“s for an acknowledge. So, the flow will be active until the http request getĀ“s a response. This may take some milliseconds or longer, if the target cannot be reached. So, maybe each message is active for 5 seconds.

I have no Idea, if for example some ressources are blocked as long as a message is active. Maybe we could have kind of a memory leak if the message pipe is filling somehow? If there are only some byte lost, maybe it takes weeks until our server is out of memory?

In any case, knowing how much memory node-red is consuming and how much messages are active would really be helpful.

There are now a number of nodes that report on node-red and/or system resource usage @BartButenaers has come up with some good stuff. Have a dig around the forum and the Flows site.

The problem with all of the Q'ing nodes is memory. Unless you can afford to drop messages, there will certainly be a limit to how many messages you can hold in a Q. This is made worse of course by the fact that you have to hold all of the code and context variables as well. These factors aren't memory leaks, they are expected operation. Memory leaks would be on top of all that and some nodes do exhibit them, thankfully not many as Node.js seems generally pretty good at handling things.

To understand in-depth about these things, you need to read up on Node.js which is the underlying platform. Understanding a bit about how Node.js (in reality the underlying V8 JavaScript engine) handles garbage collection can go a long way to understanding what may happen in your case.

Just remember that, in Node-RED, a "message" isn't really that at all! It is an allocated chunk of memory that is mostly shared from one node to the next. However, there are a few cases where a message will get cloned. The most common being when you have two wires coming out of the same node output port. Once a specific connected set of nodes has "passed" the message to the last node in the sequence, the message object is destroyed and the memory becomes available to the heap again. If GC kicks in, that heap memory is consolidated and so fully recovered. GC can take an appreciable amount of time BTW and this can add to performance woes in some cases as it stops the main Node.js thread.

Personally, except when developing, I prefer to monitor the device OS as a whole rather than Node-RED separately since that is generally the more important measure. To do that on my home automation system, I run Telegraf with the data going both to InfluxDB for charting views (via Grafana) and to MQTT in case I want to set up alerting in Node-RED. Telegraf also has other alerting outputs.

If you are setting up something for commercial use, I would recommend that you use your normal monitoring tooling and have alerts both for the system and for Node-RED.

The Node-RED nodes, however, can certainly be useful when doing development and testing.

1 Like

Node.js uses a single-threaded event loop to provide high thoughput while minimizing context switching -- understanding how that works is probably where you need to start, as Julian suggested. This article has some nice visualizations that helped me "see" what is happening.

Individual units of work can be either synchronous (e.g. function code, in-memory processing, etc) or asynchronous (writing to files or databases, network calls, etc). The node.js event loop generally runs as fast as possible until an async call is made -- then that unit of work goes back into the event loop until the other waiting units are handled. The beauty of this is that it's fairly easy to keep the system processor from waiting, without the need to manage threads and locks inside our code. Each node gets handled as soon as it can, with just the resources it needs, which are returned to the heap as soon as each flow is complete and the msg object is no longer needed. No need to worry about open file handles, database connections, or stuck threads... just focus on optimizing your nodes and/or JS code to minimize object cloning, and let the node-red runtime do what it does.

Of course, this does not mean that some bad JS code will not affect the whole system -- in fact, a tight loop without any async calls can stop the whole event loop from proceeding (at least in the old days). But that is no different than any hand-coded software. Using a simple event loop with discreet nodes to do single tasks makes it easier to "configure" and maintain a working solution (flowgram), rather than traditional programming -- which is why we love it!

Many of the async nodes are written to reuse shared resources by utilizing a config node. The msgs are just the data that is used by the node's logic, so things like database connection pools, file handles, and websockets are not carried with the msg object. The node-red core nodes are highly efficient at handling lots of data with low overhead. Many of the "contributed" nodes are also well written, but there have been some of these nodes that have caused issues like memory leaks and hung resources over the years. You will probably want to do some research before including those in any production systems.

Hi Julian,
I have created a performance collection, to make it a bit easier to search...
Bart

2 Likes