A new type of flow

While not a complete answer, perhaps the fs node should be updated to include msg.parts as needed - I think it pre-dates the use of msg.parts.

And, of course, any flow can utilise the same metadata schema and so make use of the features.

Like most high-level "languages", Node-RED does sometimes require some lateral thinking.

I certainly understand what you are asking for. But it is telling that nobody has ever - as far as I know/remember - asked for this. In general, we don't care that a flow is never "finished" if its input is known to be singular. If the input is not singular then it is "correct" that the flow never finishes - from a Node-RED perspective - and it is the responsibility of the flow to mark the logical end of one invocation of itself should that be needed.

1 Like

Try the exec node to list files and then split.
e.g.

ls <path_to_directory> -p | grep -v /

-p adds / to directories
grep -v / returns only lines without /

-E .jpg will return all files containg .jpg

Thx, this is probably the best solution for that example. The downsides is it removes the simplicify of the fs node with the complexity of script commands. Then it's a matter of adding a 'last' flag to the last item in the array, then split to individual messages. I guess that's my gripe for today. Node red does a lot things easy, but this isn't one of them!

With unlimited power :smiling_imp:, I would add a new type of flow to node red that does batch processing instead. And I'm confident it would be useful for a lot more than just me. Whether that's feasible or not is not up for me to decide, but it's my idea and I'm throwing it out there!

No need to add an array item, as the last message is the one debug after the join. Or when msg.parts.index == msg.parts.count - 1

p.s. @TotallyInformation just mentioned that he may add msg.parts to the fs node, You can probably wait for an fs update, which willm solve an issue.

1 Like

While not a complete answer, perhaps the fs node should be updated to include msg.parts as needed - I think it pre-dates the use of msg.parts.

And, of course, any flow can utilise the same metadata schema and so make use of the features.

Like most high-level "languages", Node-RED does sometimes require some lateral thinking.

I certainly understand what you are asking for. But it is telling that nobody has ever - as far as I know/remember - asked for this. In general, we don't care that a flow is never "finished" if its input is known to be singular. If the input is not singular then it is "correct" that the flow never finishes - from a Node-RED perspective - and it is the responsibility of the flow to mark the logical end of one invocation of itself should that be needed.

Great! In many use cases, it is critical to care about these things. But those tasks probably rarely overlaps with what node red is used for. Can only speculate why nobody else never asked for it. In my experience, most "languages" or "frameworks" (using terms loosely here) will have no problem doing exactly this. For example handling a bunch of messages and then doing something after the last finished. Or aggregating etc. So probably most people needing this behaviour uses a completely different tool. Now that I know multiple systems, I could easily see the need node red provides elsewhere. So it's probably about using multiple tools, but then it is kinda nice having everything in one place too.

Thanks again, that's even easier! And looking forward to any update to the fs node.

Still, even with that, it's kinda fragile in the sense that if a message fails in the delete node, it might stop and instead be passed on to an exception node? Not a big deal as it should be extremely unlikely. Just in general, for a more important task, you need to cover all these cases with I/O or network errors too.

I understand your position. However, I don't necessarily agree with it all. Node-RED is perfectly able to run a flow as a one-off - as long as you don't mind the flow hanging around afterwards. It is also possible to auto-remove a flow as well as it happens - not easy certainly but definitely possible. But generally, people doing something one-off will simply delete the flow manually and redeploy.

Otherwise most people keep flows to enable them to be reused again the future.

There really isn't much need for this feature.

If I were to say where I thought there WAS a need for something, it would be a more dynamic approach to "projects". They are pretty static right now and it would be a really nice feature to have the ability to have a folder of flows that could be run at any time.

This still requires the runtime to be completely separated from the Editor though - something that has been on the backlog nearly as long as Node-RED has existed.

As Nick said, this ain't easy and there are generally a lot more pressing needs to be dealt with.

1 Like

But isn't the elephant in the room that your incoming data/batch data is actually fragile instead ?

Perhaps I am not following your issue completely, batch processing in node-red works quite well. But if data consistency and/or backpressure and whatnot while aggregating are issues you are concerned about, I would investigate if something like Apache Nifi would be a better solution instead - but note, that is another realm of complexity and looking from the surface, your problem does not sound very complex.

There is the catch node to catch errors from nodes, and there is an example of a flow that guarantees delivery, which you could adapt to check the file was deleted before continuing. [Announce] Improved Guaranteed Delivery flow

In Node-RED erroring nodes are supposed to tidy up after themselves. Crashing nodes is considered "rude" :slight_smile: . So in this case, a failed delete should still indicate the end of the process.

To be honest, the fs nodes haven't had ANY TLC in a very long time as some silly bu&&er got carried away with UIBUILDER. :wink:

If anyone wanted to do a PR or adopt them, I'd be more than happy.

1 Like

That is possible, but unless you really need to do this serially then I see no need.

The node-red-contrib-fs-ops nodes are really well designed and provide array functionality so that you can achieve the following:

Inject node (sends 1 message once per day) --> Log start --> Get files in folder (generates 1 message) --> Delete all files (1 final message) --> Log deleted files (1 message)

explorer_Y5Zd8Kflw8

Demo flow (import using CTRL-I)

[{"id":"a8b41ecabdff9fcb","type":"fs-ops-dir","z":"afaa6118a7d8fb69","name":"get files","path":"path","pathType":"msg","filter":"filter","filterType":"msg","dir":"files","dirType":"msg","x":140,"y":280,"wires":[["94b282e95202854c","5febb380df7edb26"]]},{"id":"0c3d88fe82198d16","type":"cronplus","z":"afaa6118a7d8fb69","name":"0 0 13 * * * * (1pm, daily)","outputField":"payload","timeZone":"","storeName":"","commandResponseMsgOutput":"output1","defaultLocation":"","defaultLocationType":"default","outputs":1,"options":[{"name":"dir","topic":"dir","payloadType":"str","payload":"c:/temp/test","expressionType":"cron","expression":"0 0 13 * * * *","location":"","offset":"0","solarType":"all","solarEvents":"sunrise,sunset"}],"x":150,"y":200,"wires":[["babef6397f2590d1"]]},{"id":"94b282e95202854c","type":"debug","z":"afaa6118a7d8fb69","name":"starting: log it","active":true,"tosidebar":true,"console":true,"tostatus":true,"complete":"\"Deleting files \" & path & \"/\" & filter ","targetType":"jsonata","statusVal":"\"Deleting files \" & path & \"/\" & filter","statusType":"jsonata","x":420,"y":280,"wires":[]},{"id":"5febb380df7edb26","type":"fs-ops-delete","z":"afaa6118a7d8fb69","name":"delete files","path":"path","pathType":"msg","filename":"files","filenameType":"msg","x":190,"y":360,"wires":[["754412ba68952407","68bddcd5f651e991"]]},{"id":"babef6397f2590d1","type":"change","z":"afaa6118a7d8fb69","name":"set path and filter","rules":[{"t":"set","p":"path","pt":"msg","to":"c:/temp/test1","tot":"str"},{"t":"set","p":"filter","pt":"msg","to":"*","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":410,"y":200,"wires":[["a8b41ecabdff9fcb"]]},{"id":"754412ba68952407","type":"debug","z":"afaa6118a7d8fb69","name":"done: log it","active":true,"tosidebar":true,"console":true,"tostatus":true,"complete":"\"Deleted files \" & path & \"/\" & filter","targetType":"jsonata","statusVal":"\"Deleted files \" & path & \"/\" & filter","statusType":"jsonata","x":410,"y":360,"wires":[]},{"id":"68bddcd5f651e991","type":"debug","z":"afaa6118a7d8fb69","name":"done: (details)","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","statusVal":"","statusType":"auto","x":420,"y":420,"wires":[]}]

But if you must do things as per your comment above, then that is easily achieved too:
explorer_CjPZZJaf7W

Demo flow (import using CTRL-I)

[{"id":"6f655bc7839f8653","type":"cronplus","z":"afaa6118a7d8fb69","name":"0 0 13 * * * * (1pm, daily)","outputField":"payload","timeZone":"","storeName":"","commandResponseMsgOutput":"output1","defaultLocation":"","defaultLocationType":"default","outputs":1,"options":[{"name":"dir","topic":"dir","payloadType":"str","payload":"c:/temp/test","expressionType":"cron","expression":"0 0 13 * * * *","location":"","offset":"0","solarType":"all","solarEvents":"sunrise,sunset"}],"x":150,"y":980,"wires":[["2204334f94a2272b"]]},{"id":"2204334f94a2272b","type":"change","z":"afaa6118a7d8fb69","name":"set path and filter","rules":[{"t":"set","p":"path","pt":"msg","to":"c:/temp/test2","tot":"str"},{"t":"set","p":"filter","pt":"msg","to":"*","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":390,"y":980,"wires":[["61f9c1a1fc16f6cc"]]},{"id":"61f9c1a1fc16f6cc","type":"fs-ops-dir","z":"afaa6118a7d8fb69","name":"get files","path":"path","pathType":"msg","filter":"filter","filterType":"msg","dir":"files","dirType":"msg","x":140,"y":1060,"wires":[["dc5e41af6d682db5"]]},{"id":"8df70a1afc8fafcb","type":"fs-ops-delete","z":"afaa6118a7d8fb69","name":"delete file","path":"path","pathType":"msg","filename":"files","filenameType":"msg","x":180,"y":1140,"wires":[["b15c3cdece8f20c0","2ddb27cbc72ea43c"]]},{"id":"dc5e41af6d682db5","type":"split","z":"afaa6118a7d8fb69","name":"","splt":"\\n","spltType":"str","arraySplt":1,"arraySpltType":"len","stream":false,"addname":"","property":"files","x":290,"y":1060,"wires":[["8df70a1afc8fafcb","4f3c109a6ecabb82"]]},{"id":"b15c3cdece8f20c0","type":"join","z":"afaa6118a7d8fb69","name":"","mode":"auto","build":"object","property":"payload","propertyType":"msg","key":"topic","joiner":"\\n","joinerType":"str","useparts":false,"accumulate":true,"timeout":"","count":"","reduceRight":false,"reduceExp":"","reduceInit":"","reduceInitType":"","reduceFixup":"","x":330,"y":1220,"wires":[["76ad9e589b07c4c8"]]},{"id":"4f3c109a6ecabb82","type":"debug","z":"afaa6118a7d8fb69","name":"Deleting file log","active":true,"tosidebar":true,"console":true,"tostatus":true,"complete":"\"Deleting file \" & files","targetType":"jsonata","statusVal":"\"Deleting file \" & files","statusType":"jsonata","x":510,"y":1060,"wires":[]},{"id":"2ddb27cbc72ea43c","type":"debug","z":"afaa6118a7d8fb69","name":"Deleted file log","active":true,"tosidebar":true,"console":true,"tostatus":true,"complete":"\"Deleted file \" & files","targetType":"jsonata","statusVal":"\"Deleted file \" & files","statusType":"jsonata","x":510,"y":1140,"wires":[]},{"id":"76ad9e589b07c4c8","type":"debug","z":"afaa6118a7d8fb69","name":"done: log it","active":true,"tosidebar":true,"console":true,"tostatus":true,"complete":"true","targetType":"full","statusVal":"\"Deleted files \" & path & \"/\" & filter","statusType":"jsonata","x":490,"y":1220,"wires":[]}]
2 Likes

Allright thanks, I'll check it out.

I'm not the best at explaining myself and feel a bit misunderstood.When you say "if I really need to do this". It's not that I need to. I gave up once already. It's just that it's nice to see when a process has stopped. Otherwise you can't really know if something crashed or stopped without notice. This is very easy in virtually all other frameworks.

Then some will say it is possible, just make sure to choose the right palette packages, add some parts meta-data before-hand in a split and later join or whatever. Yeah I get that it's possible. My point is it's cumbersome. With other systems, all you'd have to do is add a function in the end or count the messages in the list or whatever. No parts meta-data added anywhere. No split. No join.

Node red has a lot of strengths for being easy to use in many cases. I don't see this as one of them. There are other alternatives that work better for some use-cases. I'm sure there are other more important priorities. It may not be easy. Just throwing it out there, as this is something that can work a lot easier. It does in other systems.

What exactly is the problem that it becomes 'cumbersome' ? Create the flow once and enjoy forever.

Messages are passed from node to node and they are not 'aware' what they are connected to and they handle each message that gets passed along, individually. This is where context becomes useful. If you want to see/know where/when a process stopped, you will have to log it somewhere at/after each node and track it.

Other systems maybe dedicated to single tasks and may do this in a less 'cumbersome' way, but they cannot do what node-red can do. node-red is a jack of all trades (and the master of all or none - depending on the flow creator :smile: ), which comes with certain concessions that may require a bit more work in some cases.

I think node-red can do anything and it's nearly limitless (I love it for this reason), but yes, it may not always be the right choice for the task, but in the end working with node-red is more fun than any other software I've used.

1 Like

Try this nodes:

1 Like

The node-red-contrib-fs-ops nodes are really well designed and provide array functionality so that you can achieve the following:

Doesn't read files recursively in multiple folders... But looks like node-red-contrib-fs has a checkbox for spitting out array after all haha!

Anyway, I shared my thoughts, learned some new things, so I retreat and can think outside the box elsewhere now. Thanks all!

Maybe you need something like dagster or airflow

1 Like

I do use Node-RED for running ETL processes, but I take a different approach than the one you're suggesting. I've developed a few custom Node-RED nodes to represent my data sources (various SaaS applications) and data destinations (primarily a Postgres database). When a message is received on the input port of a data source node, an extraction process is triggered for the configured entity. The output message then contains a payload with three key properties:

  • startTime, extraction start timestamp
  • model, a normalized description of the entity (fields ...) across various sources
  • data, a NodeJS stream of record objects. See Stream | Node.js v22.9.0 Documentation

The key point here is that all the data records extracted from the source are streamed in a single NodeJS object stream within a single Node-RED message. Each Node-RED message represents an extraction run.

The NodeJS object stream is then accessible to downstream nodes, which may consume records by reading message.payload.data. Iā€™ve also created a Transform node that processes the input object stream and outputs an object stream with transformed records. I use this to rename fields between the source and destination.

This approach was chosen to avoid generating a separate Node-RED message for each record/row obtained from the source. The Extract (Source), Load (Destination), and Transform nodes all follow a simple convention: a message payload containing startTime, model, and data properties.

There is one downside though. A NodeJS stream cannot be consumed twice, so I cannot have branches in my Node-RED flows after the extraction/source node.

2 Likes

Need to experiment with other approaches, becaus what I'm doing at the moment is a hassle.

A typical example:

This flow does 4 main things:

  1. get data1 (for example devices)
  2. get data2 (for example attributes per device from data1)
  3. get data3 (for example power consumption per device with specific attribute)
  4. send data3 to database

This job is run once per day and collect and store data for all 24 hours of yesterday for each device of data1 with correct attribute of data2.

But before/after each http request I need to add split/join. And some function nodes also adds parts object for the same purpose. Some of the http request nodes are actually a subflow wrapping the http request with built-in retry attempts (due to unstable remote API), including error handling and logging. Ideally I'd want to just throw any unsuccessful requests and continue with the successful ones. But due to split/join I need to keep them in order to combine them again to a single message. If one message failed and waits for retry, all messages have to wait. And after that add logic there to remove those parts that were unsuccessful.

All in all I find this a hassle and looking for a better way than split/join or context. I don't want do deal with context because how fragile it is in terms of missing messages. And in terms of debugging, where you have to jump back and forth between debug pane and context pane to get the full picture. Later learned to have 2 browsers open side by side to avoid this.

A http request node handlign input/output arrays would go a long way.

You can also use a change node to move the payload output to another property after the request node (?)(confused about the 2nd output) and continue the flow, there is no need to 'join' as all properties will be part of the msg then.

data1 returns an array of devices. then this array is split into individual messages (with parts) and sent to the http request for data2. data2 then outputs 1 result per message.

How can change node combine multiple messages into one single message, thus removing the need for join node?