Few instances with many flows or many instances with few flows?

Greetings people.

Background:
At my company, we playing with the idea of creating an instance for every machine we have. At the moment we have divided the machines into one of 3 instances and each machine has its own flow in it. I'm trying to figure out if there are any downsides to changing each machine to its own instance and what the benefits are.
I assume that I am lacking in practical know how to answer a question like this.

Question:
Few instances with many flows or many instances with few flows?
(What are the advantages and disadvantages?)

Thank you in advance!

It depends on many things that all need to be considered.

  • Are the servers overloaded or are your flows unstable due to locking/blocking etc? "maxing out"?
    • YES? - scale out with multiple instances and if necessary, multiple servers
    • NO? - continue
  • Are the flow files becoming very large (close to 5MB) and slowing down deploys, making debug sidebar too busy etc?
    • YES? - scale out with multiple instances and if necessary, multiple servers
    • NO? - continue
  • Is it a major problem for "MACHINE 2" if node-red is crashed because of something related to "MACHINE 1"?
    • YES? - separate them to separate instances
    • NO? - continue
  • Do you expect users to be editing flows on several "machines" at the same time?
    • YES? - Use multiple instances to reduce chance of people editing at same time
    • NO? - continue

I am not even scratching the surface on considerations that should be made - it all "depends". That answer will come with experience.

If you are to scale out with multiple instances, you would see improvements but you would also have a larger installation base to maintain and the inevitable headaches that come with that (updating versions, backups etc etc). This can of course be eased with something like FlowForge that is built for this very case (and I am certain there are other "node-red runners" out there that can spin up and manage node-red instances in a couple of seconds in a similar manor). For full disclosure, I work for FlowForge.

TBH, if you are able to provide a little bit more detail (like number of machines, type of things you are doing, PLC? Database? MQTT? how much data is passing for each machine etc) we could probably advise better. If you are uncomfortable discussing this in forum, please feel free to DM me.

2 Likes

In addition to Steve's reply, I'd personally start by looking at what I was trying to achieve. Performance isn't important until it becomes so. But flexibility might be, or cost, or resource usage, ... We don't really have any way of judging such things as we can't have the knowledge.

Specifically, multiple instances might increase network traffic or might reduce it depending on the data and the amount of local processing possible. Or, as Steve says, resilience might be reduced with lots of local instance - but maybe not since perhaps each device might be considered a commodity easily swapped out. Just two of the many things that Steve alludes to.

1 Like

Thank you for that feedback.

Details:
We have currently 4 instances, with 64 active flows in total.

  1. For testing - Not active flows 54
  2. For machines - Active flows 37
  3. Other machines - Active flows 18
  4. Not machines - Active flows 12

The main task of the flows is to get the information from the machine's process, prepare it in the right format and load it into our database.

(Side note: Currently we have the problem that some files wont be uploaded but instead they upload an other file multiple times. So we where thinking to split into more instances, because we've had this problem before. The first splitting of the flows across multiple instances helped. But at the moment, I just want to understand the differences between instances and flows.)

Thank you for your answers so far.

When you create a new instance, you have to load all the baggage that goes with it. So core node.js, all the required libraries, ExpressJS server, etc. Then you have to load the Node-RED core, then any additional nodes. Then you still have to add your flows on top of that - understanding how much CPU and memory they consume isn't the easiest of tasks though there are some nice helpers that have been created now.

So that is quite a lot of baggage. However, it still all fits into a Raspberry Pi nicely as long as the data you are keeping in memory isn't too massive and your flows not too complex and your message throughput isn't crazy.

You also need to think about potential bottlenecks - especially on single-board computers such as the Pi. That typically uses an SD-Card and they are very slow compared to even a spinning hard-drive. This is particularly noticable when having to either deal with large files or if one of your processes start to page (where in-memory data has to be written or "paged" to storage temporarily and then read back later).

On a single instance, flow complexity can lead to some unexpected things happening such as embedded data been retained longer than you think and so bloating the memory use. But on the other hand, if you only need a relatively simple flow to process some sensor data, you might be able to use that to whittle down the amount of data to a minimum and pass that on to one of several central instances working on a round-robin. That might be super-efficient. But not necessarily as there are far too many unknowns here.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.