Split node to output complete:true when finished

When using the split node on a bunch of urls, i want to combine the outputs - but I don't know how long it will take as the number of urls could vary. The split node does not emit an {complete:true} - the complete node cannot be used.

Request: have the split node emit the complete flag, or perhaps add a secondary output that emits the {complete:true} so that the join node can be used in automatic mode.

The msg.parts property should have a count and a len(gth) property - which should indicate which of the set you are currently processing.

(But yes - it would be sensible for it to set etc complete also - I'm slightly surprised it doesn't)

That is correct, however, some urls do not respond, ending up in error, causing the parts/count never to reach their total count.

If a join does not receive all parts, it can't complete. Each part has to reach the join node so there needs to be a catch or something similar to pass the msg to the join.

does the parts property get dropped from the message as it goes through the request node ? I don't think it does... - so you should still get a msg for every request.

parts do not get dropped, but urls in error do not emit a msg at all, only an error.

Hence you'll need a catch to pass the msg to the join node.

Or have the request node, join node or split node capture it, so that the user does not have to deal with it.
It is a request topic.

do you mean that html and/or join nodes should be doing what the catch node does?

is the html node causing the error or the http request node?

because the Unix principle of do one thing and one thing only: the html node is responsible for parsing valid html, the join node is responsible for capturing messages and the catch node is responsible for capturing errors. That's it. There is no overlap in functionality by design.

The user is responsible for combining these individual nodes into a working flow. If the user is not capable of doing that then that's a different problem.

It's like saying the power plug should ensure that my lights get dimmed, my washing is done and the dishes are clean - the power plug delivers electricity - that's all it does. A dimmer dims the lights. A washing machine cleans the washing and a dishwasher does the dishes. The user is responsible for combining these things into a household.

The user could always live in a hotel if the user does not want to deal with all that!

I am asking to have the split node to emit a complete msg that's it.

I don't know what path you are off to, but the split+join node will never complete if the url is not reachable.
Sure it can be catched, but the join node could still complete it, now it just waits until eternity.

This flow demonstrates what I mean

[{"id":"72457078ac85ea26","type":"inject","z":"1cf772ae2066495e","name":"Urls","props":[{"p":"payload"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"[\"https://google.com\", \"https://apple.com\", \"https://does.not.exist\", \"invalid url\"]","payloadType":"json","x":192,"y":748,"wires":[["ae01fa137efd7a98"]]},{"id":"ae01fa137efd7a98","type":"split","z":"1cf772ae2066495e","name":"","splt":"\\n","spltType":"str","arraySplt":1,"arraySpltType":"len","stream":false,"addname":"","x":349,"y":810,"wires":[["2ea234f77808a397"]]},{"id":"2ea234f77808a397","type":"change","z":"1cf772ae2066495e","name":"","rules":[{"t":"set","p":"requestTimeout","pt":"msg","to":"1500","tot":"num"},{"t":"set","p":"url","pt":"msg","to":"payload","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":435,"y":887,"wires":[["65efa75adb7fe5d0"]]},{"id":"65efa75adb7fe5d0","type":"http request","z":"1cf772ae2066495e","name":"","method":"GET","ret":"txt","paytoqs":"ignore","url":"","tls":"","persist":false,"proxy":"","insecureHTTPParser":false,"authType":"","senderr":false,"headers":[],"x":669,"y":887,"wires":[["a2738911dadf41d6","3b61188e9d5edf56"]]},{"id":"a2738911dadf41d6","type":"join","z":"1cf772ae2066495e","name":"","mode":"auto","build":"object","property":"payload","propertyType":"msg","key":"topic","joiner":"\\n","joinerType":"str","accumulate":"false","timeout":"","count":"","reduceRight":false,"x":1104,"y":880,"wires":[["ae199e2f8004c3b1"]]},{"id":"3b61188e9d5edf56","type":"debug","z":"1cf772ae2066495e","name":"success count","active":true,"tosidebar":false,"console":false,"tostatus":true,"complete":"payload","targetType":"msg","statusVal":"","statusType":"counter","x":1036,"y":941,"wires":[]},{"id":"34b7a9385953989d","type":"catch","z":"1cf772ae2066495e","name":"","scope":["65efa75adb7fe5d0"],"uncaught":false,"x":692,"y":846.5,"wires":[["c4f8ad0e9bb3c8a5","a2738911dadf41d6"]]},{"id":"ae199e2f8004c3b1","type":"debug","z":"1cf772ae2066495e","name":"complete count","active":true,"tosidebar":false,"console":false,"tostatus":true,"complete":"payload","targetType":"msg","statusVal":"","statusType":"counter","x":1324,"y":879,"wires":[]},{"id":"c4f8ad0e9bb3c8a5","type":"debug","z":"1cf772ae2066495e","name":"error count","active":true,"tosidebar":false,"console":false,"tostatus":true,"complete":"true","targetType":"full","statusVal":"","statusType":"counter","x":1029,"y":813.5,"wires":[]}]

of the four urls, two fail yet the join receives all four messages and completes.

and that's what I'm saying is not true, use a catch node and the join obtains the missing msg.

and using a complete true msg does not mean that all messages have been handled since technically[1] order isn't guaranteed - so the message could arrive out of order with the complete true msg arriving first.

[1] = NodeJS being single threaded means it is but one should not depend on that.

EDIT: Having made the same learning (split & join to working together) a while back, I did describe my learning - I've now added the http request use case to that description.

1 Like

I know what you mean, it is the way i have 'worked around' it, but you see: the net result is the same, except you now have an array with lots of crap in it that you need to deal with next.

My point is that i don't want the join node to wait until eternity and then rework the flow to make it work by catching errors. I am not interested in url's that are not reachable. Sure it is the 'proper' way from a technical/development perspective - I am looking for a quality of life enhancement: make it simple for the user.

and using a complete true msg does not mean that all messages have been handled since technically [1] order isn't guaranteed - so the message could arrive out of order with the complete true msg arriving first.

Yes this may be an issue, it dont know the solution.

Ideally we all want a flow like this:

Screenshot 2025-01-09 at 10.52.07

Obviously this isn't - not yet but it's just around the corner - how the world works (for the most of us).

Using the catch node is for me the simplest and most elegant way of dealing with the problem, for me as user this is a no brainer. But this is in the context of Node-RED and Node-RED is a flow based programming environment within which the unix principle of do-one-thing-and-do-one-thing-well is applied.

Within this context, the catch node is the correct solution, it's not a workaround. Because the one thing that catch node does is catch exceptions that other nodes generate because those nodes have reached a point not being able to handle something.

Of course, this being Node-RED, it IS possible to get the flow simpler if that is the desired output. But it requires more work on the part of the flow author OR the use of a suitable custom node. Both are possible.

Personally, I think that @gregorius's response is absolutely reasonable and I would certainly expect to have to handle exceptions when checking a bunch of websites since there are many factors involved regarding whether you will get a response, what that response will be and what order things will return in. Under the covers, doing a web request isn't a simple task. Node-RED already hides most of the complexity for us.

However, I can easily see that either a function node or a custom node could be given a set of URL's to check and process and return whatever output is required.

I don't think you've actually told us what you want to do with the outputs though, perhaps if you can share that, we could collectively find a better approach?

If you incorporate a switch node to check the status code returned (or catch error returned) and set the switch node to reconstruct message sequence, it should adjust the count property allowing the count and index to meet.
e.g.

[{"id":"6404f5eb0d159af9","type":"switch","z":"667cec54c048503c","name":"","property":"payload","propertyType":"msg","rules":[{"t":"neq","v":"2","vt":"num"}],"checkall":"true","repair":false,"outputs":1,"x":330,"y":1080,"wires":[["ee6ea42105774050"]]},{"id":"9fa43bc9381e9411","type":"inject","z":"667cec54c048503c","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"[1,2,3]","payloadType":"json","x":170,"y":1180,"wires":[["ffd11eb3410f1f3c"]]},{"id":"ffd11eb3410f1f3c","type":"split","z":"667cec54c048503c","name":"","splt":"\\n","spltType":"str","arraySplt":1,"arraySpltType":"len","stream":false,"addname":"","property":"payload","x":310,"y":1180,"wires":[["1b3a4fba69963aef"]]},{"id":"1b3a4fba69963aef","type":"link call","z":"667cec54c048503c","name":"","links":["77e090d12365b86d"],"linkType":"static","timeout":"5","x":460,"y":1180,"wires":[["e4b67e9df4d9b1a7"]]},{"id":"e4b67e9df4d9b1a7","type":"switch","z":"667cec54c048503c","name":"","property":"statusCode","propertyType":"msg","rules":[{"t":"eq","v":"200","vt":"num"}],"checkall":"true","repair":true,"outputs":1,"x":610,"y":1180,"wires":[["e9fd2986a9de861a"]]},{"id":"b331e9d4517ed189","type":"catch","z":"667cec54c048503c","name":"","scope":["1b3a4fba69963aef"],"uncaught":false,"x":450,"y":1240,"wires":[["e4b67e9df4d9b1a7"]]},{"id":"e9fd2986a9de861a","type":"join","z":"667cec54c048503c","name":"","mode":"auto","build":"object","property":"payload","propertyType":"msg","key":"topic","joiner":"\\n","joinerType":"str","useparts":false,"accumulate":"false","timeout":"","count":"","reduceRight":false,"x":770,"y":1180,"wires":[["7d1e5c9416b6bbd3"]]},{"id":"7d1e5c9416b6bbd3","type":"debug","z":"667cec54c048503c","name":"debug 2579","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":930,"y":1180,"wires":[]},{"id":"77e090d12365b86d","type":"link in","z":"667cec54c048503c","name":"link in 22","links":[],"x":205,"y":1080,"wires":[["6404f5eb0d159af9"]]},{"id":"ee6ea42105774050","type":"change","z":"667cec54c048503c","name":"","rules":[{"t":"set","p":"statusCode","pt":"msg","to":"200","tot":"num"}],"action":"","property":"","from":"","to":"","reg":false,"x":540,"y":1080,"wires":[["4ff016361e13828a"]]},{"id":"4ff016361e13828a","type":"link out","z":"667cec54c048503c","name":"link out 11","mode":"return","links":[],"x":705,"y":1080,"wires":[]}]

As the request can return asynchronously you could receive the complete before last request returns.

[edit] You can force a msg.complete to the last message, if this is really required.
e.g.

[{"id":"f30907e267f77845","type":"inject","z":"667cec54c048503c","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"[1,2,3]","payloadType":"json","x":130,"y":1260,"wires":[["c986177b5fcf1c30"]]},{"id":"c986177b5fcf1c30","type":"split","z":"667cec54c048503c","name":"","splt":"\\n","spltType":"str","arraySplt":1,"arraySpltType":"len","stream":false,"addname":"","property":"payload","x":310,"y":1260,"wires":[["223056f14d699689"]]},{"id":"223056f14d699689","type":"change","z":"667cec54c048503c","name":"","rules":[{"t":"set","p":"complete","pt":"msg","to":"$$.parts.count = $$.parts.index + 1 ? true","tot":"jsonata"}],"action":"","property":"","from":"","to":"","reg":false,"x":470,"y":1260,"wires":[["11e4b99f38408537"]]},{"id":"11e4b99f38408537","type":"debug","z":"667cec54c048503c","name":"debug 2579","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","statusVal":"","statusType":"auto","x":730,"y":1260,"wires":[]}]

Totally agree with this. If you want to create the computer/machine does something node in the diagram above, then Node-RED totally allows you to do that even if that node goes against the Unix principle of doing one thing well.

But caveat emptor - if the custom node fails, then you have to deal with that yourself.

I prefer to use the Node-RED way if possible since I know the individual nodes have been tried and tested and failure is less likely. Occasionally if there is functionality missing, I would create a custom node but that custom node would not be the computer does everything node, rather the custom node would provide the missing functionality and that custom node could then can be combined with existing nodes to create a complete flow.

Example: if there were to be a new format - let say ExtendedJSON (eJson) and there would be a website that generated eJson. I would build a custom node that would parse eJson which I would then combine with a http request node. I would not build a new http request node that would know how to parse eJson because this would restrict parsing of eJson to only content that comes from the web, i.e., parsing a local file with eJson would not be possible.

Having said that, I noticed that the http request node does do json parsing:

Oh dear .... I've just contradicted myself.

For the original request, the problem with the split node issuing a complete flag on the last message in the sequence is that it then requires the messages to arrive in the same order at the join node - so the complete message is the last one in. If you have any async work in the flow (such as an HTTP Request) then it would be possible for the messages to get out of order.

You also have the issue of what happens if the message with the complete flag is one of the ones that doesn't get through.

This is something that I mentioned in my transfer encoding chunked topic:

This could be avoid by having something that checks the parts and ensures that messages are passed on in order, something like a order guarantee node - that would buffer only those messages that arrive out of order. The node would maintain the current parts number, i.e., last part sent was X therefore the next part to be sent will be X+1, all other messages are buffered until X+1 arrives - is there something like that?

Of course the simplest answer would be: use a join node. But in the context of the post, that makes no sense since I'm splitting large files into small messages and joining them altogether would be counterproductive. Having a node that partially buffered messages and released them in order would - hopefully - avoid excessive memory usage.

How would this work?

For example ten messages sent in the following order:

4 5 0 1 8 2 3 7 6 9 

an order node would do the following:

Receive: 4
Internal Buffer: 4
Send: <nothing>

Receive: 5
Internal Buffer: 4 5
Send: <nothing>

Receive: 0
Internal Buffer: 4 5
Send: 0

Receive: 1
Internal Buffer: 4 5
Send: 1

Receive: 8
Internal Buffer: 4 5 8
Send: <nothing>

Receive: 2
Internal Buffer: 4 5 8
Send: 2

Receive: 3
Internal Buffer: 8
Send: 3
Send: 4
Send: 5

Receive: 7
Internal Buffer: 8 7
Send: <nothing>

Receive: 6
Internal Buffer: <empty>
Send: 6
Send: 7
Send: 8

Receive: 9
Internal Buffer: <empty>
Send: 9

So the node would maintain a buffer and an index to know which messages it has sent. Of course this would need to be group by some id but I believe the file read node already does this by grouping all data blocks it generates with an id.

Is there any node that does this already?

The switch node does something similar (vaguely similar), it can reconstruct the message sequence if you feed the catch error into it to, see example above.

Ok, I don't understand how however here's the challenge:

[{"id":"a2f84f8e1af415eb","type":"inject","z":"af52cade0eb7c896","name":"","props":[{"p":"payload"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"[0,0,0,0,0,0,0,0,0,0]","payloadType":"json","x":358,"y":1750,"wires":[["a59add7d2e0a9722"]]},{"id":"a59add7d2e0a9722","type":"split","z":"af52cade0eb7c896","name":"","splt":"\\n","spltType":"str","arraySplt":1,"arraySpltType":"len","stream":false,"addname":"","x":692,"y":1750,"wires":[["dbd119392773d60e"]]},{"id":"dbd119392773d60e","type":"change","z":"af52cade0eb7c896","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"parts.index","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":849,"y":1820,"wires":[["ccd84ebd0bbf91a8"]]},{"id":"ccd84ebd0bbf91a8","type":"delay","z":"af52cade0eb7c896","name":"","pauseType":"random","timeout":"5","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"3","randomUnits":"seconds","drop":false,"allowrate":false,"outputs":1,"x":957,"y":1882,"wires":[["87c5210b7c4a2f33"]]},{"id":"87c5210b7c4a2f33","type":"switch","z":"af52cade0eb7c896","name":"","property":"payload","propertyType":"msg","rules":[{"t":"else"}],"checkall":"true","repair":false,"outputs":1,"x":1117,"y":1750,"wires":[["af7341f20efd69c7"]]},{"id":"af7341f20efd69c7","type":"join","z":"af52cade0eb7c896","name":"","mode":"custom","build":"array","property":"payload","propertyType":"msg","key":"topic","joiner":"\\n","joinerType":"str","accumulate":false,"timeout":"","count":"","reduceRight":false,"reduceExp":"","reduceInit":"","reduceInitType":"","reduceFixup":"","x":1319,"y":1750,"wires":[["9571acd0e67d2f83"]]},{"id":"9571acd0e67d2f83","type":"debug","z":"af52cade0eb7c896","name":"debug 285","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":1556,"y":1750,"wires":[]}]

Take that flow and do something in the switch node that will cause the debug output to be [0,1,2,3,4,5,6,7,8,9] always! At the moment it's completely random:

What the output represents is the index value in the parts object on the msg after the split node. Then there is a delay and the join collects together all the messages until all ten have been received. The result is the order in which the join received the messages. What I would like is something that ensures that order is always the same, i.e., 0,1,2,3,... in order.