File writing blocks NR?

Hi ,

While i am aware NR uses Node.js ... and i would expect that when we write the files from NR - other processes will continue, What we see is that other NR processes go on hold when we do file write.

We have 100k records to be written to file. And currently we are trying to send all 100k in payload.
I am aware this is not the best solution and buffer wise writing is recommended etc. - but want to understand if it makes sense that while file is being written entire NR should become unresponsive.(editor is unresponsive. Even on console we see nothing).

It's surprising that volume of data would cause a perceptible delay.

I packed 1 million short, identical lines into msg.payload with a function then wrote it to a file.
The whole process took about 2 seconds on a Raspberry Pi with 1GB memory.
The editor is accessed with Firefox on WIndows.

While I am currently not yet into performance. I understand if 2 s it may make it non perceptible. But i would assume this not to impact main thread of Node.js. Even it takes time - i am not clear why everything should get stuck.

What "other processes"?

A process is an OS thing (an application, a different node-red is a process). Flows inside 1 node application (i.e. 1 instance of node-red) are all 1 process sharing 1 event loop.
Please be 100% clear.

  • How big are these records?
  • How big is the final file?
  • How long does the operation take?
  • Written to 1 file? or multiple files?
  • Is there any processing of data before writing to file?
  • Are you writing the content of payload directly to a single flie? Is this file being written to an SD card? and FTP server? a NVME drive?

Lastly, can you provide a minimal flow that demonstrates the issue.

I mean other flows.

100k records. Each record around 300 bytes. Flow took end to end 15 mins. 31 MB.

1 file.

Yes. We have some internal application processing to produce the data and write to DB and then we pull from Db and write to file.

Payload goes to write node and is written in 1 step to the file.
Linux mount point.

What is the mount point? A network device? USB device?

I meant, do you process this file data (the payload) in node-red before writing to file?


Last couple of questions:

  • How are you writing the file? File node? Function node?
  • Do you have branches in the flow that pass this large payload down multiple wires? And/or debug nodes?

Taken from Internet :
A mount point is a directory on a file system that is logically linked to another file system. Mount points are used to make the data on a different physical storage drive easily available in a folder structure. Mount points are fundamental to Unix, Linux and macOS. Windows can use mount points, but it is not common.

I see what you are getting at.
So once received from DB -

  1. Checks if payload is empty (switch node)
  2. Converts to CSV ( csv node)
  3. replaces some characters using regexp (function node)
  4. To file node.

Using "write file" node.

2 debug nodes yes.

We know what a mount point is, the question is what it is connected to. For example it might be a local hard disc, a local SD card, a network drive, or a plethora of other possibilities.

As Colin says, we know what a mount point is. The question was "What is the mount point" not what is a mount point

At a guess, this is a network share. That is a bottleneck for sure

But I suspect parsing 31MB of data in CSV node, 31MB Regex parsing, split/duplicated twice (branches), there are several places this is slow.

I recommend you place inline flow measurements along each part of your flow to identify each bottleneck. This will help.

Lastly, can you share your flow (replace your database node with an inject node & populate the payload with some sample data)

You might be able to do the data manipulation in your SQL query and create a CSV file directly?

My bad . Sorry.

This should be seen across application. We dont see in other applicatioons working on same mount point.

Will try.

Will do.

Not necessarily. It may actually be an issue with the large payload and the file node or the filesystem (which you still have not answered) and/or the node version. This is why I ask the questions.

How have you confirmed that it is the File node that is taking the time?

One other point that is particularly important when handling particularly large objects. That is, if you have any nodes having >1 wire on an output port, you are making an actual copy of the data at that point. Something which is going to be slow.

While 15min is certainly excessive, it is important to understand how much of that time is spent writing the file. There are a tremendous number of things that can impact write performance. Certainly I would always want to avoid a monolithic write of that size, especially to a network mount.

Have you checked to see if your DB server can write to a CSV file directly? Many can and this will almost certainly be better than relying on another process. Sometimes it is better to treat Node-RED as a prototyping tool but then convert the process to native handling. A decent SQL db should be able to do data manipulation including reshaping and regex replacements as well as dumping data to a CSV.

I am preparing a flow to be posted here.

My original Q remains - should file writing hold the editor ? Wouldnt node.js offload it to non main thread ?

I am trying to decipher now based on responses if indeed file writing node is the one - causing the editor to not respond or one of the preceding nodes (Convert to CSV, function node, debug node) . Will check and confirm.

The editor is served by the node-red back end so the answer is yes.

But, the question by its self is irrelevant. It is far more likely the other things mentioned like writing large data to a network share (which you STILL have not confirmed), processing large string data in a single threaded application, causing duplicates of data (due to branching) are things to identify first.

There may well be an issue with the file node HOWEVER if you do the flow timings as suggested and it turns out the file write took 3 seconds (and it was your processing that took 14 mins, 57 secs) then the question is mostly moot.

On the other hand, it may turn out a node is doing something sync when it could/should be async (and would therefore unblock)

Bottom line: we need you to answer the questions, do some flow timings, and post your demo flow so we can assess.

ISTR all calls in the file write node are async, but there is a performance hit if you set the filename via a msg property (rather than fixing it) as we have to check the name hasn’t changed every msg and if so then close and reopen the file. However even that should be minimal on 100k chunks.

Thanks Steve . The missing piece for me was what you mentioned here on sync/async. Rest agreed it will depend on if indeed thats the bottleneck.

Cos based the inputs here its clear that there may be other reasons for the time it takes and blocking. So i will go node by node to check which one it is. Luckily its 3-4 nodes to check.

I will revert back once done.

Agreed (I need to take a peek at the code to refresh my memory - but only after we fry the bigger fish ;))

1 Like

Again, there is no other thread. Node.js is single threaded. What it does with async actions is not hold up the processing loop while waiting for slow operating system functions. So assuming the file out node didn't get coded with a sync call (which you can easily check yourself in the node's code), it will be doing the best it can.