Introducing node-red-cluster - a complete clustering solution for Node-RED (feedback and alternatives welcome)

yer but that's what kafka offers, it remembers the message a consumer has consumed and when the consumer comes back up, it can start receiving messages from there. At least that's Kafka used to do, who knows what it does now (without Zookeeper).

So a consumer acknowledges messages and Kafka takes note of that. The assumption is that a consumer does not acknowledge unless the message has been fully consumed.

I like to be controversial. I once wrote a piece on why it isn't that bad if node red packages aren't maintained. Basically because they live in a stable environment[1], they might not need maintenance. Think of unix tools - when was the last time cat or sed or awk or tee ... were updated? They just work and are completed. And so it is with node-red packages - because they provide very specific functionality. Do one thing and one thing only, and do it well.

Another case in point: what if a node implements a well defined and stable protocol - something like FTP. Is the node broken, bad, or shouldn't be used if it has been updated in 6 days/weeks/months/years? No. If it just implements the FTP protocol, then it probably won't be updated. And actually that's a good thing. And because NR hasn't changed, the node will probably continue to just work.

Those kind of node packages also exist, so it's not the same as a "npm" package were the used-by-date is the last time it was updated. Which is wholly a false assumption: just because something gets constantly updated it must be good. I come from an age where things could just be done and don't need constant updating.

Indeed, no NodeJS required nor included.

And yes there aren't that many libraries out there that make it easy to port nodes to Erlang but that's not really the port of the exercise. It's to see what is required to implement flow based programming (FBP) - remember that's the influence on NR. If you have not heard of FBP, then look it up. Erlang is far better suited to FBP than NodeJS and part of the point of Erlang-Red is to show that.

Second, being able to copy & paste flow code and have it run in either NodeJS or Erlang or both is simply very useful. If not slightly left-field. Thirdly, what are the concepts of Erlang that could be ported back to Node-RED - Supervisors? Behaviours? Processes?

I mean it would be really cool to have a series of X-Reds - Rust-Red, Ruby-Red, etc. Of course, these only make sense if you can also interface with the underlying programming language. That is something that many Erlang folks what to see in Erlang-Red - that you can interact with the existing processes and not only defined nodes. Then Erlang-Red becomes this introspection vehicle into an Erlang system - might be useful, who knows.

Sorry for being a bit of a dick with my comments - I think what you are doing is good but I also like to know why and whether there is another approach.

Then you'd be using TLS over a HTTP connection. Or if you want, encrypt the data. Basically what your saying is what every good mobile app has to do - secure their traffic and prevent replay attacks.

Things like SSL certificate pinning then happen. But much beyond secure protocols via SSL (MQTT supports - atleast EMQX - SSL encryption) there really isn't much that can be done. So you have a secure wire to the worker and with a shared key, that worker can encrypt the data before sending over the secure wire.

[1]=NR is very stable in the sense of the runtime and nodes that ran with version 1 can potentially be executed with version 4 - most certainly the flows can still be loaded.

Hello,

This is indeed a really great job that you have done.

I've been using nodered in k8s with multiple replica, redis for caching, kakfa for messaging and postgres to store flows. It works very well but we didn't manage the ability to have only one editor to manage multiple nodered.

On your solution, did you see the state of node and logs in editor for multiple workers or only one?
And how did you manage it?

Edit:
In my solution we disabled the ability to add node package in pallete, all replica share the same postgres database, when we update flow we kill other replica (this doesn't happen often yet). We create flow with scalling in mind, no use of global context for example, we save things with redis node with maxAge. For samba polling we use a custom node for k8s lead management.

3 Likes

Hello Sullivan,

Thanks for your feedback! :slightly_smiling_face: I read your message this morning and it made me think that it would indeed be really useful to see the cluster state from the admin and view debug messages from all workers.

So I've just implemented a new Cluster tab in the admin editor that shows all connected workers. (node-red-cluster v2.0.0 is out)

For debug message visualization, I've set it up so that everything published on the "debug" topic by Node-RED (RED.comms.publish("debug", msg)), which is used by the debug node and by node.warn/node.error, gets forwarded to the admin and published on the same topic.
For node.log, those remain in the console: for those logs I prefer to leave the job to Grafana Loki.

BTW the Admin node is a fully functional Node-RED instance that can execute flows (in fact, during flow development I normally test on the admin first), but for live monitoring, seeing all messages from all workers is definitely super useful!

Initially, my approach was also to kill a node to restart it with the new state, but in the end I found that updating the state while maintaining uptime wasn't particularly difficult.

With node-red-cluster:

  • Global/flow contexts are fully shared across all instances
  • Setting values is atomic (using Lua scripts in Redis)
  • You can build flows without thinking about clustering, it's transparent
  • The only special cases are singleton flows which can be handled with the "cluster-leader" and "release-cluster-leader" nodes

For example, scheduled jobs (inject nodes with intervals) will run on ALL replicas by default but wrapping them with a cluster-leader node ensures they run only once across the cluster.

4 Likes

Wow great new functionality :star_struck:

Thanks for your reply.

Did your lead manager allow to run some flow only in the admin (whitout running in other workers) for prototyping and debugging purpose?

When you ask workers to reload flow did you reload all or you load only modified flows/nodes depending of what option user have choose in the depoy node-red menu (see below)?

In my company the kubernertes cluster doesn't allow volume using, that's why I began developing a postgresknode-red flow storage plugin. I saw in your redis storage plugin that you assume the filesystem storage will be alway use and the source of truth. I think I would need to implement some kind of reloading from database in my postgres storage plugin like you did in your redis storage plugin.

Some feedback after a quick look at your github repository:
I think you would need to update the schema in your documentation to show that admin instance execute flow because right now only workers have the "execute flows" label.

1 Like

Did your lead manager allow to run some flow only in the admin (whitout running in other workers) for prototyping and debugging purpose?

Currently, there's no built-in mechanism to restrict flows to admin-only.
I'm planning to expose a cluster API available in function nodes that would allow you to check:

  • Whether the current execution is on admin or worker
  • Which specific worker is executing (hostname/ID)

Something like:

// In a function node
if (cluster.role === 'admin') {
    node.warn("Running on admin for testing");
} else {
    node.warn(`Running on worker: ${cluster.workerId}`);
}

if (cluster.workerId === 'nodered-worker-1') {
    // Run only on specific worker
}

When you ask workers to reload flow did you reload all or you load only modified flows/nodes depending of what option user have choose in the depoy node-red menu (see below)?

Currently, when the admin saves flows, workers perform a full reload (runtime.nodes.loadFlows()). I haven't implemented differential reloading based on the deploy type (full/modified nodes/modified flows). It could be an enhancement for the future.

I saw in your redis storage plugin that you assume the filesystem storage will be alway use and the source of truth. I think I would need to implement some kind of reloading from database in my postgres storage plugin like you did in your redis storage plugin.

I initially started developing node-red-cluster completely without volumes (pure Redis/Valkey storage), but then I realized I would lose the integration with node red's projects unless I did a complete refactor.

To be more confident in being "production-ready", the current implementation uses a wrapper pattern around localfilesystem:

  • Admin: disk is source of truth (supports projects, git integration, etc.)
  • Workers: lazy restore from redis on startup, then use local cache
  • redis: acts as the sync layer between admin and workers

For your Postgres plugin, you could implement a similar pattern, use Postgres as the source of truth and sync from there, without requiring filesystem access.

Some feedback after a quick look at your github repository:
I think you would need to update the schema in your documentation to show that admin instance execute flow because right now only workers have the "execute flows" label.

Yeah, you're right. I didn't include it because conceptually I think of the admin as a "development environment" - a place where you edit and configure things. However, you're correct that it's important to clarify: the admin is a fully functional NR instance that can both edit AND execute flows.
This means you can also include the admin in your load balancing setup if needed.
I'll update the README to make this clearer. Thanks for pointing it out! :slight_smile:

2 Likes

This is great! My application seems a bit different from what you had in mind originally, but it seems to be a natural fit. I'm dying to try this out in my home setup, which is a large array of IoT devices. Maybe this is what IBM had in mind when they invented NR in the first place?

In my home, each wall switch, HMI, and controller/sensor endpoint is an IoT device capable of running NR or NR for MCUs. They all use PoE for power and ethernet for communications, right down to the individual wall switch. Home Assistant serves as the SCADA.

I would like to be able design/maintain the entire system as a single flow (with many tabs and subflows) but that hasn't been possible until (hopefully) now. I'd like to use tab metadata, decorations or subflow instance parameters to allocate the execution threads or subflow instances to specific IoT endpoints, with anything not explicitly assigned allocated to a "pool" of CPUs for execution. I'll probably want to use multilevel hierarchies pretty quickly but I've not tried that in NR yet. They're moving closer to that with things like subflow instance parameters.

A lot of the endpoints will need to use NR for MCUs since they are RP2040s or ESP32 based. I had originally planned for each switch, light fixture, sensor/controller endpoint, etc. to have its own separately managed NR (for MCUs) instance, communicating with Home Assistant via webhooks (rather than MQTT, to reduce latency). I'd have to assign IP/Webhook addresses manually to accomplish message routing. Your scaling approach could really simplify design and management while making it more self-documenting by making the message interconnect explicit in the flow and automating deployment.

Do you have some thoughts on how to make this work with NR for MCUs? Perhaps I could "manually" get a NR instance running on each endpoint, have it register with the main system used for editing the flow, and have the endpoint listen for instructions, functioning like a "bootloader" in some sense?

Just a hint:

  • There is compatible Node-RED written is RUST too!
    Read more here:

I know

Operating with NR for MCUs, after you deployed the flow to a device, there's no NR "instance" running anywhere - or even necessary.
In fact, only when you're working with the node-red-mcu-plugin and have a powered MCU connected to your host system w/ NR, this local NR instance may be used to act as an interface to simulate the flow running on the MCU - for debugging purposes.
The true value of NR for MCUs manifest in the option to easily design dedicated & complex(er) flows / algorithms for a tiny MCU.
For standard use cases like a trigger, sensor or whatever, I'd rather go with sth like ESPHome.

1 Like