Modularization?!

Dear community

I'm a stakeholder of a recently kicked-off project where Node-Red will be used for machine connectivity and data flows between the machines. Developers in the project team suggest to create a virtual machine for every instance/installation of Node-Red, in future most likely 50 instances are needed. Several flows for every instance! This according to developers is modularization on a high level which will make life easier and will also decrease dependency between instances . I am not an architect but can instantly see both pros and cons with the suggested setup. I would like to ask the community... Do you have any experience and advise/recommendation you could share regarding the architectural setup when having many Red-node instances? Do you see any pros and cons with the suggested setup?
My POV:

PROs:

  1. Modular
  2. Less dependency between the instances
    3 ..

CONs:

  1. Performance issues when having so many instances
  2. A lot more maintenance and work
  3. ..?

Would appreciate all relevant input that you kindly would like to share
Thanks in advance
Best regards
PG

Are the flows there to communicate between machines?

Each machine can have it's own flow communication with one central virtual machine and the virtual machine communication back to each machine.

I all depends on how you set up your flows, messages and who need to communicate with who.

Based on what though ?

What traffic load is expected, what throughput is expected ?
Are there any SLA's/SLO's defined/attached to availability ?
Is it depending on hardware or does it just talk over the network for in/output ?
Why virtual machines ? Why no kubernetes/docker orchestration ? A VM has a lot of overhead - let alone the maintenance of all of them.
You can run 50 instances besides eachother on a single node, just configure a different port for each instance.

Question, questions.

1 Like

This is one of the use-cases FlowForge handles, both the roll-out of flows to instances as the communication between them. With the size of the deployment you're talking about, you could consider both FlowForge open-source or the premium version. If you're interested in understanding more about FlowForge feel free to email zj@flowforge.com to set up a call.

Full disclosure; I'm the CEO at FlowForge

1 Like

There are a lot more questions to be answered here. How valuable will the end system be? How sensitive (what happens if there is an attack?)?

What is your preferred virtualisation architecture? Will you be running the instances in a cloud or locally?

All of these things will feed into potential answers. For example, for a low-value, non-sensitive service, I'd use node.js's own native modularisation and simply run multiple instances on a VM with slightly larger specs. On the opposite end of the scale, I probably would consider running multiple cloud VM's. In that case, I'd also be looking at high availability and load balancing. That will be costly but for a high-value service, worth the effort and cost.

Totally depends on the OS and hardware configuration.

Not necessarily, the use of automation and system templating (infrastructure as code) would deal with that.

There's a lot of questions a lot of people here have pointed out asking. But I'll throw a curveball at all of this and see if I can add some food for thought.

Node-Red is a Javascript based system. Javascript itself is a system built for client-server relationships. Generally speaking, you have a server, which does some processing and storing of data. You also have a client, which pulls and processes data. It's not specifically designed for standalone operations like C++, C# or others, though it is capable of it. But it has a lot of simplicity and native ability to handle interactions between setups.

The developers in this case seem to be thinking that scalability is all based on how much power and diversity you can put on one system. The more VMs and instances you're running, the more you can handle. To be fair, that's true because Javascript is natively a single thread, single CPU application (though there are functions implemented that have changed that). To try and run everything in a single instance would cause a huge load on one CPU and bottleneck an entire setup.

What I would look at is what does the server need to process and what workloads can be shifted to clients? That is essentially making a multi-threaded, multi-CPU setup, right? If the server is handling the data storage and transmission, the clients can handle any rendering, local processing (thinking data coming off of machines) and data direction. I don't see why you would need so much running on one single system. Modularizing the setup doesn't necessarily mean needing to section off and modularize the instances running. It can be modularizing the layout and where things are processed. Keep the data processing close to where the data is acquired and then keep the data traffic control and storage close to those locations as well. Same effect, but much simpler.

1 Like

I have a production application running across multiple instances, somewhat similar to what you describe but not exactly the same (4 instances, each in their own VM) though in our case the separation is for isolation of security and failure domains.

One of the most important considerations is figuring out how you want those separate instances to communicate. For the sake of security, the narrower that channel can be, the better. If you're using something like MQTT, or even just some sort of RESTful API, you should be using TLS. If you aren't using TLS, start. This brings with it the challenge of handling certs. If you're using 50+ instances, that can become cumbersome, so get ahead of that by coming up with a mass update strategy before you are looking down the barrel of a cert expiration.

3 Likes

A lot of this would depend on what the developers are comfortable with and who is going to support it.

It would also further depend on your IT infrastructure and what you have in place there.

For example if you are running VMware ESXi this would be very easy to do and would give lots of benefits - you could create a template VM (or a linked clone) and this could be quickly duplicated and rolled out.

All VM data could be stored in a central location (on fault tolerant storage) and whenever OS updates etc were to happen they could be rolled out across the whole farm (once tested etc) by rolling out a change to the base image.

This would make it very easy to troubleshoot communication to begin with - once the developers/support people were comfortable - an performance metrics were better known (something which VMware is good at recording) you could make a decision whether there were benefits to rolling out into a lighter weigth docker environment etc

I would say the deciding factor comes down to developers and what they can support and know.

Craig