Scalability Node-Red

We have been working with node-red for some time, where we have wired 10's of devices. This have worked like a charm, but we're going deploy more devices now. We're deploying thousands of devices, but we're in doubt whether node-red can handle this. Maybe we will even deploy 100 of thousands devices. What's the pros and cons? Will node-red have the ressources and power for this? Is node-red scalable?

Thanks in advance!

More details are needed to better judge your situations.
Are you devices running Node-RED or or you using Node-RED at the backend level to manage those devices ?

While I agree with the other comment, I would also say that if you are making that kind of investment, I'm not sure that Node-RED is really the right kind of tool to use as a basis.

While Node-RED performance is pretty good, it is an open-ended prototyping tool that just happens to be both really good at letting people with limited (and not limited) skills develop processing systems and also free.

But if you are building a large-scale business out of it, personally, I would look at ways of utilising it's strengths (RAD) and avoiding potential cul-de-sac's such as performance and possible lack of direct investment.

If you have 100's of thousands of similar devices deployed, I would invest some time and money in optimising data capture and data flow. Then you could use NR as a tool giving less experienced developers or analysts access to data in a more controlled way.

1 Like

I think it also depends on how Node-RED is deployed. Using systems like Balena greatly eases large scale deployments and makes the language more irrelevant. All depends on how things are deployed.

1 Like

Hi, just joined. Im wondering how did you guys go about with this? I'm thinking of using NodeRed to handle about 10,000 sensors for an industrial site all running at one data point per second. It will run on a powerful quad core server.

Most commercial applications are able to support this high cardinality, wondering if NodeRed can too.

It is impossible to say without a lot more information on how the sensors are read and what you want node-red to do with the data. In fact even then it will not be possible to give much of an answer as there will still be too many imponderables. Note though that node-red is single threaded which will limit how useful the four cores are. Again, however that may not matter as a lot of what you want to do may be handled by other processes, which will run in other threads. That includes network and file access, database access etc. The only way to know is to mock something representative up and try it.

Thanks for the reply. I intend NodeRed to do the following:

  1. Read every second MODBUS data points from a special block. This block holds the 10,000 sensor tags where each sensor will correspond to 'msg.'.
  2. Combine all data into a single string.
  3. Send all data to another service via TCP output.

It's a relatively simple process. The only heavy processing part is probably when reading the data points via the MODBUS block.

So do you mean that every second you want to do a single (large) modbus read, interpret the data and build a string from it then send that via a TCP connection? If so then I don't think you will need a powerful server, I imagine a Pi 4 could do that and barely notice. The most intensive part will most likely be extracting the data and formatting the string.

1 Like

How do you read 10,000 sensors tags / second in NodeRed ?

What type of data are these? the modbus TCP node can read a maximum of 100 WDs (16bit) per operation. So if you need 32bit data, then you can half that (e.g. 50 DWORDS per operation)

This in turn means you would mean attempting to do 100 (16bit) or 200 (for 32 bit values) reads in one second.

Based on a PLC node I wrote/maintain I can just about achieve 150 reads per second (UDP) and thats pushing it. Then of course you need to process/package these into a string & transmit to a TCP endpoint.

There is of course no reason you couldn't have multiple node-red instances and scale out.

I would say, it is possible but I question the need to blindly log values every second.

For CBM type data collection, typically we use ladder to "sanitise" things. For example, through a cycle, collect data like min, max, average and post a "good / sanitised" result at the end of a production cycle. This has the multiple benefits. e.g. not polling and storing data for 48h over a weekend / 8h every night when the line is not running (only capturing data when the machine is actually running (sane data)). Database size benefits. Manageability of data size.

1 Like

That puts a completely different complexion on the problem. I did not realise that modbus TCP has such a limiting constraint.

Thank you Steve for the reply. Much appreciated. I have never heard of this 100 Word limitation.
Cause we're already hammering our PLCs with 2,000 data per second for SCADA communication via Modbus TCP.

With NodeRed, I intend to connect several PLCs and ingest data at 10,000 data point per second as a TCP string.

Regardless, the above experience you mention does concern me now. What hardware are you using? I intend to use the Modbus Library

Actually, it is 123 (but I seem to remember the node-red implimentation is fixed at 100 for some reason) ref

This limit is hidden by driver implementation. What is happening under the hood is - multiple requests/polls are occurring. Even on your SCADA. PS, 2000 data per second is (probably) easily achievable in node-red too.

If you are unaware of this, you can (AND WILL) eventually hit issues with inconsistent data (i.e. the 1st item of 10000 would probably have been read 100 polls before the last item - and could have changed in the mean time). you should ALWAYS group items relative to one another into single poll blocks.

This is why when i do data collection, I do it with the aid of the PLC as it "knows when" it is time for collection (e.g end of a cycle - or -after 10 cycles - or - after 1hr running - whatever the desired criteria is), move the LIVE values into a "collection" staged area & set a "trigger" for the application to recognise it is time to capture and store the SANE/consistent values.

Also, I will typically buffer values in the PLC with an index, datatype and timestamp (via a subroutine).
This has several benefits...

  • Any loss of connectivity does not necessarily mean loss of production data
  • Any lag or fluctuation between PLC ~ network ~ Database becomes irrelevant (since the timestamps are all relative to each other - you can depend on the between 2 entries as being gospel - unless clock was changed)
  • creates a stream of data minimising the need for thousands of areas to be polled
  • creates a simple framework for users to add more data (simply push your data with a datatype onto the stack & it will be collected & put into the database - without a single modification to database/scada etc)

Virtual Machine on HyperV. Not a terribly powerful setup but probably more perform-ant than a RPi (though you never know till you try)

The problem isnt really with node-red. The problem is understanding what is going on under the hood and making your solution do the right thing.

Why dont you just give it a go & see where you get to.

Remember - unlike a SCADA - this doesnt cost you anything but time.

1 Like

@Colin, many PLC protocols (modbus included) have similar limitations in the amount of values they can request in 1 poll. It is often around 250 bytes (or less due to packet overhead / CRC etc) .