Only clone messages sent to 2nd or higher wire on same output

Hi,

inspecting the way then Node.send works (node-red/Node.js at 34cb93794cd7c814f3ff7bd4fe036aa2fd7e29c9 · node-red/node-red · GitHub) it seems that all messages but the first sent on the first wire are cloned.

While cloning a message sent to multiple wires from the same output is a desirable behaviour, I would expect that the cloning flag is reset for messages sent on different outputs for the node and it would be the node responsibility to clone if necessary. Also if there is an array of messages sent to a single output on a single wire, none should be cloned. Currently, only the first message is not cloned, that is if I have send([m1, m2]) and a single wire, m2 is cloned.

Example of desired beahavior
In my node with 2 outputs I write send([[m1, m2], [m3, m4]])

output 1 is connected to wire1 and wire2 and output2 to wire3 and wire4.
I would then expect the following behavior:

m1 is not cloned on w1
m1 is cloned on w2
m2 is not cloned on w1
m2 is cloned on w2
m3 is not cloned on w3
m3 is cloned on w4
m4 is not cloned on w3
m4 is cloned on w4

What are your thoughts?

Seems sensible.

I'm slightly wary of picking at this as we've been caught out by mistakes in this area before. See Cloning messages in a flow : Node-RED for example. It has been some time since I've held all of the nuances of the cloning logic in my head.

On the face of it, what you describe sounds reasonable - but there are a couple problems that would need addressing.

In your example, you assume m1, m2, m3 and m4 are all different messages. But that cannot be assumed. The node could be doing:

send([[m1, m2], [m1, m2]])

So any code would also need to track what objects had already been 'seen' and make a further decision as to whether it needs cloning or not. Looking at it now, it might be feasible to do that with a WeakSet with a minimum of overhead.

The other scenario to consider is whether a node that does multiple calls to node.send() whilst reusing the same message object would get affected.

The critical thing would be to not introduce a new edge case whether the behaviour changes to the extent flows break as an object is no longer being cloned when it was previously.

1 Like

I am rather new to Node Red and your input is really valuable. I will try to consider additional edge cases, but the issue I am facing is that I have some unclonable messages (tensorflowjs tensors) via lodash.clone which mean that with the current behaviour I cannot wire a node that outputs a tensor to 2 or more nodes. Tensors can be cloned using their own "clone" method (TensorFlow.js API) ,

I agree that care should be taken no to break existing code. What about keeping track of which messages have already been sent in a given send invocation in order to know which to clone? This would ensure all existing code would continue to work.

A better option would be to enhance the clone function to use the clone() method from the object if the object exposes it.

Hi @massi-ang,
I had a rather similar issue some time ago. When using OpenCv objects (images, matrices, ...), you need to clone those objects via the OpenCv cloning functions.

As you can see in this old discussion, I have been experimenting with custom cloning. However due to free time constraints, I had to quit this. Keep in mind that it was a quick test, so it might be a bad design!!

It would be nice if cloning could somehow been customized in the future. But I assume this will probably not that easy, because it has to work in a lot of different scenarios. And it should be fast, because this code is being called all over the place.
Bart

I've not looked into any of the Node-RED code on this but an option to use an objects prototype.clone() function sounds feasible? If the object had a clone that was called something different, you could always add a clone method to the object and get it to call the native clone?

Yes indeed. But I assume there are libraries that e.g. use copy instead of clone. Or they use a factory instance to create copies, or whatever...
Therefore I thought - at the time being - that Node-RED would need to offer custom clone plugins to be plugged in. That way you could register e.g. a custom cloner that tries to find a prototype.clone. But if I start to install extra clone plugins (for other libraries), they would have to be called in sequence until one of them is able to create a clone. But that might slow down things. Or not. Don't know.
And then I got other headaches: what if I import a flow from somebody else, who uses a contrib cloner plugin which I haven't installed. Then that flow wouldn't work for me, and me don't knowing why...
And so on...
Therefore I quitted my experiment :exploding_head:

Found a workaround for my node code that does not require any changes to core functionality: i created a cloner node that has 1 input and 2 outputs. In my cloner node, I clone my object with its own cloning method, and then invoke send twice, like

send([[orig], [null]]);
send([[null], [clone]]);
done();

This works for me now, and could be applicable to any custom payload that cannot be deep cloned by Node-RED.

The drawback is that I cannot use the idiomatic flow design of attaching multiple wires to the same output as this would still use Node-RED cloning.

I have also experimented with the original idea of avoiding unnecessary message cloning for the first time a message is sent to a wire (either on 1 or multiple outputs). The solution I came up with is to use a Map to keep track of which messages have been already sent and set the cloneMessage field accordingly. Map stores keys as references to the original object, hence it should not have any significant performance or memory impact.

Thoughts?

I have now done a really simple experiment which works:

In RED.utils.cloneMessage I replaced var m = cloneDeep(msg) with:

var m = cloneDeepWith(msg, function(v) { const cloner = customCloner.get(Object.getPrototypeOf(v));
return cloner ? cloner(v) : undefined; })

I also added:

const customCloner = new Map();

function registerCustomCloner(proto, cloner) {
    customCloner.set(proto, cloner);
}

In my nodes, where I need the custom cloning I call:

RED.util.registerCustomCloner(myPrototype, function (v) { v.clone(); }) // or whatever cloning method works for your object

This leverages lodash.clonedeep customizers and relies on the assumptions that outside of the standard deep cloning provided by lodash, custom cloning will be required for specific objects that can be discriminated via their prototype.

What I see as advantages of this approach is that cloner functions are registered to a specific Object prototype and should therefore be compatible across nodes contributed by different modules. So if someone else writes an extension that uses tf.Tensor and register a cloner function for that object prototype, it will work also for me.

There are edge cases:

  1. Same prototype but different objects and different cloning -> flows stop working as expected. How likely would this be?
  2. Nodes replacing a cloning function with a sniffing/hack piece of code -> warning when a new cloner function for the same prototype is set

Would like to hear your thoughts

1 Like

@massi-ang,
Nice to see your experiment! Because this would be a very useful feature...

Your approach seems to make sense to me. Because a cloner that simply checks whether a clone function is available is imho a bit too general...

That seems indeed like an advantage.

To be honest, I have not really experience in that...

I was thinking: suppose there is a separate npm module for such a Tensor cloner hook. When I develop a Tensor based custum node, and I add that cloner hook as a dependency (and I register the cloner hook in my node). Another developer also creates his own Tensor based custom node, and he adds that same cloner as a dependency. That way you are always sure your custom node will work correctly. There will only one cloner in the map...

Second thought about this while typing... Of course there will be a problem (I think) when two developers will create their own cloner for the same prototype, because only the last registered one will be in the map. Suppose a cloner hook has become obsolete, and the author of that cloner hook doesn't respond to my issue. Then I want to develop my own cloner hook for that prototype. However I am not sure that mine will be overwritten by the obsolete one. So perhaps automatically registering cloner hooks is perhaps not a good idea after all....

But that "might" be adisadvantage of using the prototype as key in the map. Not sure at the moment...

Here is my experimental code: GitHub - massi-ang/node-red at custom-cloner

There are no restrictions on how the cloner functions can be deployed, and using npm packages seems a sensible one, but I do not think that using npm modules should be mandatory. Most cloner functions would be one liners using the deep cloning functionalities of the underlying object they apply to.

Regarding the "someone else overrides the cloner functions for a given prototype I use": my assumption is that if the prototype is the same, the object is the same and therefore the cloner function, whoever has written it, should work. This should be the normal case.

Now, suppose another node overrides my cloner function for a given prototype with a broken/malicous one: this could happen and I propose that every time a different cloner for the same prototype is registered, Node red emits a warning. This will allow me, as the user of the nodes to know what is happening and make an informed decision on which nodes to use or avoid using.

It could also be better move the registerCloner() method on the Node prototype so that it is possible to log which node is performing the registration.

Indeed that is correct. The problem that I had described probably will almost never occur I assume.

I am not sure if that is correct... I "think" that cloning of messages might be a new kind of hook, but I may be completely wrong!!!! The message hooks have been introduced via this pull request. But as you can see here it is not decided at the time being how/if custom cloning would fit into this picture...

BTW you can read here that cloning happens between the preRoute and onSend hook.

I try to avoid mentioning @knolleary as much as possible, but I'm afraid he is about the only one that can judge whether you are heading in the right direction...

Thanks for your feedback.

The issue I see in using hooks for this functionality is the following:

For how hooks are designed, I would need to add an handler for each object type that need custom cloning.

Say I have 3 handlers registered for types A, B and C respectively. I get a message containing, at some level in the message structure an object of type C.
This message is sent to handler for type A. How can this handler determine that it should clone or not clone the message and stop or not stop the cloning chain?
The same applies to the other 2 handlers, the one for type B and the one for type C.

My reasoning was that cloning must anyway be performed based on the flag cloneMessage and the issue lies in the extensibility of the cloning function. Luckily the deepCloneWith method already provides a way to customize the cloning. My suggestion is to make a generic custom cloning function using an hash table indexed on the object prototype.

And you are right about extending the Node.prototype. This is not needed as I saw this other way for obtaining the node that is invoking the registerCloner function for logging purposes:

const stack = new Error().stack;
const callModule = stack.split("\n")[2].split("(")[1].slice(0,-1);
Log.debug(`Adding hook '${hookId}' from ${callModule}`);

I admit that my testing has been limited to my tensor objects for now, and I might be missing something.

Your expert advice is so much welcome.

Don't worry. I'm always lurking - especially anything cropping up in Core Development

But I'm also knee deep in some refactoring at the moment that's occupying enough of my head that I don't yet have a feel for this discussion. I'm almost done on this work then will take a look at this with hopefully fresher eyes.

2 Likes

I did not have time yet to have a close look at the hooks. But I proposed to use a new cloning hook, just because hooks are an existing way to handle message processing. And it would be nice if we could make cloning customizable via a mechanism that already exists. To avoid reinventing the hot water :wink:

Isn't that a bit similar to how your prototype mechanism works (without the Map)?
Suppose your tensor clone hook handler does something like this:

if ( tf.is_tensor( obj) ) { // Or use a prototype equality check here ...
   return obj.clone();
}
return null;

When null is returned then the next clone handler in the sequence will be called.

Yes indeed I'm also a fan of that route, which is why I had been experimenting with that in the past. So I absolutely agree with you to keep looking into that direction. Perhaps I was not clear about that. I was only trying to use hooks somehow inside our cloneDeepWith function implementation, to customize that implementation via the standard hooks mechanism:

var m = cloneDeepWith(msg, function(v) { 
   // Loop the sequence of clone hook handlers to try to find a handler that can create a clone
})

And yes you are right that the hash table will be much faster compared to my sequence of handlers that have to be looped. And performance is crucial in a code snippet that is being called for every (nested) property of a message object...

So then it indeed looks like hooks might not be the best option. We only need a way to allow developers to create a custom cloner, that can be registered easily - based on a prototype - into the hash table. I have no real life experience with hooks and plugins, so cannot help with that...

Good to know. From now on I will start all my discussions in the Core Development :wink:

Hi @massi-ang,
Last week there were some discussions about Tensorflow and OpenCv for video surveillance in Node-RED. A higher level of integration of both libraries in Node-RED could largely benefit from your proposal above (by transferring/cloning tensors and matrices correctly between nodes in a flow).

Your experiment was already quite advanced, and you have gained already quite some knowledge in this area.
So it would be a pitty if all of that would get lost...

Do you have any plans to finalize your design proposal? Please say "yes" :wink:

Thanks!!!
Bart

Hi Bart,

I have been busy with some other stuff lately, but I am definitively interested in finalizing my proposal. As I am a newbie in Node-RED, can you guide me through the process to follow?
Massimiliano

I looked again to the code I had written and I am pretty confident that the design should work fine.

The lodash.cloneDeep gets replaced with lodash.cloneDeepWith function. The custom cloning function which is passed does the following:

function customCloner(v) {
    const f = customClonerMap.get(Object.getPrototypeOf(v))
    return f ? f(v) : undefined;
}

The nice thing is that if the function returns undefined the standard clone function is used.

A node that wants to add a custom cloner for an object it needs and which is not supported natively by Javascript would call registerCloner(Object.prototype) during the node registration process. For security, it should be only allowed to register prototypes that are not already supported by standard cloneDeep functionality: do we have a good way to verify this?.

Multiple nodes could set the cloner for the same prototype which works fine under the assumption that the goal is to provide a deep cloning of that specific type and all nodes are well behaved. Is there a need to prevent a malicious node to set a malicious cloner code?

Hmm, good question. Normally you should discuss new functionality first here on Discourse, and afterwards you create a pull request. In some cases not all functionality is 100% discussed here, but then the discussion continues in the pull request.

Very nice of you to summary the entire feature again. Thanks!!

Very good remark indeed. When looking at the lodash documentation, I see this:

supports cloning arrays, array buffers, booleans, date objects, maps, numbers, Object objects, regexes, sets, strings, symbols, and typed arrays. The own enumerable properties of arguments objects are cloned as plain objects. An empty object is returned for uncloneable values such as error objects, functions, DOM nodes, and WeakMaps.

Checking those types could be done once in the registerCloner function, to make sure we don't cause a performance issue (by checking it over and over again). But indeed it would be best to avoid those types being cloned in any other way...

Perhaps others might have another idea about this. But imho all nodes at the moment can already execute malicious code. So I don't think this is in any way different...