Would it interest you to implement a way of per-node CPU pinning?
The idea is that per-node CPU pinning would bring a notion of "true concurrency" to the dataflow programming implementation of node-red. In a theoretical scenario, you would have some node A and some node B, both of which would be processing some complex task and then a collector or collimator after that that would reduce the results produced by both A and B. It would be then interesting to delegate task A and B to different CPUs, cores or logical CPUs depending on the task to be carried out.
+---+
| A +--------+
+---+ |
+----| reduce
+---+ |
| B +--------+
+---+
It would require implementing a low-level affinity-stting shim in C/C++ that would then pin a method (or a Node-Red node) onto a selected CPU. The shim would do nothing more than use pthreads to set the affinity of a callback and it would be loaded using 'NODE_API_MODULE'.
I can manage this on my own so it should be even easier for you but I am asking whether there is any interest in doing so in order for this to become a permanent feature. Maybe every node in Node-Red can have like a button to select its affinity and allow the node to be pinned to a certain CPU. It makes even more sense if you run on an industrial architecture or, say, even A.I. where you could have dedicated CPUs for certain tasks.
For example's sake, I do some heavy video-processing and I am using a NUMA-partitioned SBC with an architecture that contains a merge of two different ARM processors where one processor set is sepcifically tailored for video processing while the other is only made to carry out simple operatons such that it would be massively cool to only use the heavy-load CPU sets for the video processing and then delegate simple tasks to the rest.
Your idea is interesting but I am not sure this would be possible without either scaling this back or a serious rewrite of the runtime.
For example every node’s on('input',…) handler runs inside the same Node.js event loop. There is no per-node thread, no per-node execution context, no per-node scheduler.
In essence, I am pretty sure you cannot take a JS function and tell the OS
“Run this callback on CPU 3”.
I mean sure, if every node was a worker thread or a child process then there would be a possibility.
I am not knocking your idea but I am curious to understand how you would achieve this. As I see it, your shim suggestion would affect the entire thread (i.e. the whole runtime / all nodes) not individual callbacks (which node-red nodes essentially are)
I am not knocking your idea but I am curious to understand how you would achieve this.
Well, you don't have to knock my idea because I have already made it as a functional prototype for a single-node but I don't want to maintain a fork just for this. I suppose I wasn't asking in terms of the know-how to achieve this but rather whether the Node-Red project has an interested in implementing the feature as part of the main distribution.
What in particular are you linking to here ↑ (that is the worker threads docs) and as I said before...
Anyhow, your proposal is obviously interesting and I am curious as to how you would get each node to run as a thread, handle thread safety, serialising/deserialising messages between threads, handle context/status/logging etc.
I suspect it depends on the scope of changes. If the implementation is transparent to contrib nodes (e.g.. a common core UI change for selecting affinity etc), whether folk have to do anything special to their installation or contrib nodes etc
For the record, when heavy processing is required, i typically spread load in a clustered manner (e.g. multiple Node-RED) or use a worker thread or child process (all achievable today with varying degrees of difficulty) but that said, I am still intrigued by your idea - you have clearly put thought into this. If you have a fork available for people to download and try out - that would be a start.
Strangely this is something that you get for free if you use a language like Erlang - which is true concurrent processing, i.e., multi-theading, multi-processor. Erlang is based around having independent processes communicating via messages with each process having a "mailbox" where messages are queued for processing. So that messages are handled as singleton events - i.e. Erlang provides concurrency amongst processes and single threadness for process computation - best of both worlds really.
Sounds familiar? Yes, it's exactly what a flow in Node-RED could be if each node were independent process and the lines were message pathways. Then you would get true concurrent processing just by adding nodes and wiring them in parallel. Pretty neat really.
That was one reason I implemented Erlang-RED - to take advantage of the concurrent/parallel computing support of Erlang and the nice UI of Node-RED. Erlang syntax is rather complicated to get ones head around, hence things like Elixir and Gleam that provide better syntax and abstraction layers. Or putting something like the NR editor on top of Erlang makes things much even easier (I would argue ;))
It's a fascinating topic of how to interpret the NR UI for other things, such as parallel computing but that would be off-topic here.
Which is great and all. But JavaScript and particularly Node.js is not Erlang. Node.js workers have not insignificant overheads and limitations as I understand things.
Erlang does not use OS threads for its threads but rather a lightweight process scheduler. It is a totally different architecture to Node.js and really isn't relevant in this discussion is it?
I don't think this is a good idea at all I'm afraid.
Each worker thread runs in its own instance of V8 - each instance consuming something like 10MB of RAM I believe.
Each thread also requires a startup time. Whilst only around 10ms or so, this can soon mount up.
There is also no direct memory sharing between threads which would certainly break a large number of contributed (and possibly core) nodes. Data sharing has to serialise/unserialise data which adds further overheads.
Then there is the total number of CPU cores available. Switching Node-RED to use separate threads for each node would seriously disadvantage people working on architectures with limited available cores.
And I won't even bother to go into the issues around error handling.
So even were it feasible to add this to core, it would break loads of things and add infeasible overheads to Node-RED.
Worker threads are useful. But it is down to the node author(s) to implement them in the right place for the right reason.
I run Node-red on a Raspberry Pi Zero 2, which has 4 cores. Node-red uses up to 100% of one of them, presumably limiting productivity. (Of course it more often runs out of RAM than CPU)
A consequence is that if the Node-red editor becomes unresponsive I can still ssh in to the device to fix it.
It also leaves three processors free for other work such as an RDBMS, file serving, etc.
Yes I could run two NR instances and pass tasks between them.
I could offload heavy jobs to external scripts or C via the exec node.
In fact my most demanding tasks tend to be database related and I generally use more complex SQL queries rather than processing large datasets in Node-red, therefore [probably] using a different CPU core.
So it would be nice if I could double productivity on this tiny computer by Node-red itself using 2 or 3 cores, as long as I don't have to manually assign nodes to cores.
If node.js can launch CPU-bound workers, then clearly it's part of the API to do so and settting the affinity is already accomplished by other modules like nodeaffinity. It is easy to create a C/C++ proxy as a custom solution that uses POSIX threads to set the affinity of the worker itself. Just because you use an event-driven framework does not mean you should never "fork" a process or not implement any form of parallel execution.
Not all nodes would be affected because CPU pinning should be optional and all other nodes would just follow their default behaviors. It would not matter for trivial stuff like "function nodes for string concatenation" but it will matter if you write a small module within a "function" node that performs some intensive computations that you could delegate to a core. Node-Red has contributions like GPU-bound nodes (and even semantically that is alright) that are able to delegate mathematical operations onto a GPU as in 'node-red-contrib-gpu', so it would be something similar but for CPU affinity rather than GPU tasks. Spin-up time of threads, IPC and error handling are just a matter of implementation and a shim that just peforms CPU pinning is about 100 lines of code.
I run Node-red on a Raspberry Pi Zero 2, which has 4 cores.
It gets even more interesting on something like a Rockchip that has two ARM-processors that have different speeds with one of them being energy-efficient compared to the other, so you can offload tasks depending on what their requiremetns are in order to better partition resources. With NPUs and AI, it becomes even more important to delegate tasks correctly rather than relying on round-robin scheduling. . .
It is! ... and besides all that "visual parallelism" implied by flows that run parallel to each other at design time could be turned into a practical realtiy. Nice functional style combined with dataflow. I like it. I should really check your project out, thank you!
Oh, I misunderstood, I thought you were saying that the RDMS work was what was clogging things up. What is it that you are doing that is using up cpu in node-red then?
I've looked at worker threads a few times in the history of Node-RED to explore this type of approach.
The main challenge that I hit was how to handle the lack of shared memory access between the worker threads - everything has to be marshalled between the threads (via SharedArrayBuffer) - which comes as an overhead to the throughput.
It isn't a topic I've looked back at recently, so don't know if there are more options available in more recent Node.js versions.
The proposal really wouldn't help you. It would swamp the cores and make everything less responsive.
The proper answer is for nodes that need to do heavy compute or other synchronous activities to create their own worker threads, not to try and make the whole of Node-RED take over all the cores.
If that has happened, it is mostly a bug I suspect. A loop of some kind. In truth, other than silly endless loops I might have created, the only time my Pi's ever became unresponsive was due to a DB table getting too large (happened to me with both InfluxDB and MongoDB) or with some other process going rogue.
Which is, of course, the correct way to handle complex data queries when you have a DB engine hosting the data since they will almost always be more efficient than using JavaScript to process the data.
It is nice in theory. But unlikely in practice. I've already said that worker threads come with serious overheads in memory if nothing else. And, as said, if the original idea were simply dumped on all of Node-RED's nodes, the system overheads would be horrid and almost certain to bring your Pi Zero to a crashing halt.
JavaScript threads have to be handled with considerable care if they are to help rather than hinder.
Yup.
No, it would make it a lot worse.
Certainly nobody has said otherwise. It absolutely IS possible. But if you read up on thread handing in Node.js, you will quickly see lots of cautionary tails about not using it except for specific use-cases.
That is besides the point. I've not talked about the wisdom of trying to override CPU core affinity. Personally, experience tells me that this too would not be a good idea to overuse. Operating Systems and core services such as node.js spend a lot of effort to optimise things. Overriding that optimisation may be useful is certain detailed cases but is rarely a good idea in general.
And nobody has said otherwise. The key word here being "never". Of course there are cases where this is advantageous. And node authors are absolutely able to do so. And some have.
But why does this need something baked into Node-RED itself? What am I missing? I could write a node right now that made use of this, I don't need any help from the Node-RED platform for that, it is already easy to do with Node.js code.
So shouldn't the suggested feature request be to add a "delegation" option to function nodes. That delegates the function to a worker and opens an option to pin to a specific core?
I'm not seeing a reason to make this a big thing that would require reworking parts of core and potentially breaking things.
Personally, I would rather see this as a contributed node that allows function-node type coding that runs in a worker thread. I think that the temptation to turn on such a feature in a standard function node may well cause people far more issues than it solves (given that most people will not understand the potential consequences).
This would be true of a custom app. It is certainly not true for a platform like Node-RED which provides generic and effectively open-ended compute capabilities. And, from my reading, they are far from "just" anything. If they were, not so many people would be writing articles about being careful with them. Just the inter-thread data transfer alone is potentially enough to catch people out when working in Node-RED considering that messages can be quite/very large on occasions and (see the recent thread on the subject) that there are plenty of JavaScript specific data types that are very hard to serialise and un-serialise. Indeed, some simply cannot be without data fidelity loss.
This is not relevant at all. Other than it enhances the case to let node authors do it when they need it. The number of lines of code is not related to the overheads and other issues.