Hi everyone,
I’ve recently developed and open-sourced a project called node-red-cluster - a full clustering solution for Node-RED, built around Redis/Valkey.
It aims to make Node-RED horizontally scalable while keeping a familiar workflow for developers.
What it provides
Clustered storage - one admin instance (with editor) and multiple workers that automatically sync flows from the admin.
Distributed context store - global, flow, and node contexts shared across all nodes via Redis/Valkey, with atomic ops and compression.
Leader election - ensures that scheduled jobs or singleton tasks run only once across the cluster, with automatic failover.
Package sync - keeps all Node-RED nodes/modules consistent across admin and workers.
Docker/Kubernetes examples
I built this because I couldn’t find a clear or complete clustering solution for Node-RED.
Most discussions or examples focus on partial setups (like shared contexts or replicated flows) but not a full architecture with leader election and plugin sync.
Now that the project is working well in my own tests, I’d love to hear the community’s thoughts:
Open questions / feedback I’m looking for
Have you ever needed to scale Node-RED horizontally? How did you approach it?
Do you see any better or simpler alternatives to this approach?
Any feedback, criticism, or alternative designs are very welcome.
I’d really like this to be a discussion about how Node-RED could scale more natively in distributed environments.
What happens if the admin instance dies or gets too many requests it starts to drop messages or throttle? I'm asking this question because I understood that the admin has a flow that has a node that routes messages to workers. Is my understanding right?
Flowfuse has a cluster solution that enables HA. I never tried it but maybe you should look at it. I think they solved message distribution across all instances using the network layer with some kubernetes native feature. And the sync of flows between instances may have been done like you did, with custom storage and context plugins, but using postgres instead of redis.
No, the FlowFuse HA mode isn't that sophisticated today; each instance is its own thing, with shared context and load-balancing of incoming HTTP traffic. You still have to build your flows knowing that multiple copies are running in parallel - so making sure to use things like shared subscriptions in the MQTT nodes etc.
To clarify this for any reading along, who, like me, can be confused by the terminology, HA mode is presumably High Availability mode, not Home Assistant.
To clarify - the admin doesn’t route messages to workers. The architecture works differently:
Admin role: Only responsible for the flow editor UI and saving flows to Redis. When you save a flow in the editor, it gets written to Redis and a pub/sub notification is sent to all workers.
Worker role: Each worker runs its own complete flow execution engine. They all execute the same flows independently - there’s no message routing from admin to workers.
The admin instance is essentially a “control plane” - if it goes down, workers continue executing flows without interruption. The only thing you lose temporarily is the ability to edit flows. Once the admin comes back up, you can resume editing.
For incoming requests (HTTP, MQTT, etc.), you’d typically use a standard load balancer (nginx, k8s service, etc.) to distribute traffic across workers. Each worker handles its own messages independently.
The leader election feature is specifically for scenarios where you need singleton behavior - like scheduled jobs (inject nodes with intervals) or tasks that should run exactly once across the cluster, not once per worker.
But the main reason I went with the heuristic is to keep the synced message as small as possible, so workers can reach the correct state as quickly as possible. The smaller the message, the faster it arrives.
I know it's probably negligible in most cases, but I thought there might be projects with very large package.json files where this could make a difference.
Looking at it more carefully, the package.json managed by node-red already contains only the packages needed for flows (no devDependencies or extraneous stuff). If I wanted admin-specific packages, I wouldn't install them through the node red editor anyway... I'd install them at the system level outside of Node red's package management.
The heuristic could actually cause problems by filtering out legitimate packages that don't match the node-red-contrib-* or @* patterns. For example, a custom package named mycustom-nodes would be excluded.
You're right that simply syncing the entire package.json is simpler, more reliable, and doesn't exclude any valid packages. The message size difference is negligible since we're only sending package names, not the actual package contents.
I'll refactor this to sync the entire package.json instead.
Right, that’s what I meant. When I said “message distribution,” I wasn’t referring to routing Node-RED’s internal runtime messages between instances — that would break object references and class instances. I meant the load balancing of incoming HTTP requests across multiple Node-RED runtimes, which Kubernetes handles at the network layer using round-robin or similar algorithms.