NodeRed XML to MSSQL

I have created a Node-Red Flow that reads a huge XML file (Around 2.5GB) passes each line to a function which is looking about a specific tag and returns the value.

Everything is fine when I try to visualize the values using chart
Flow is like this timestamp-xml-function-chart

Now the problem presents when I try to store these exports to SQL Server using mssql-plus node.
Flow is like this timestamp-xml-function-mssql

Some of the values are passing through the SQL Server and then is stopping and I see errors in debug like Error Connection Reset or Connection Closed.

The file contains 5277363 values of type char[4] (like 2019,2020) to be stored on the SQL Server Table with one column with type String nvarchar(4).
Both Node-Red and SQL Server are dockers which are communicating with each other inside the same host.
Using C# in the same system the parsing can be done in 10sec for storing those values into a variable.

So maybe is a frequency/latency error from SQL Server?
If that's the case why the MSSQL Node doesn't delay the whole flow and just keep accepting requests until it cannot handle them?

Some Questions...

  • How much of this 2.5GB XML is held in memory at any one time? AND how much memory is assigned to the docker images?

  • How often is this process to happen? Every minute? Daily? Weekly?

  • Are you generating SQL INSERT strings? Using a stored procedure? Using the parameters UI?

  • Are you generating 1 INSERT per row of data or sending multiple rows per INSERT?

  • How many connections are you making (i.e. how many MSSQL-CN configs are there)?

  • Can you share your flow and a small XML sample file?

    • If not, a screenshot of the flow might help spot something

C# will be using the MS compiled drivers I imagine.

I will try to answer those good questions:

  • I don't know how much of this file is held in the memory, and that's the reason i configured the xml parsing as "line by line" to avoid memory consumption.

  • The process is manual and triggered by me (while pressing the timestamp / with no repeat) just for getting friendly with Node-Red, dockers, and do speed comparisons among technologies.

  • The import is done with insert statements using {{msg.topic}} with one insert per row, not sure how I can gather some as an intermediate step and push them as a batch.

  • I have only one MSSQL Node with only one SQL connection.

The reason for mentioning about C# is only because i am sure that the parsing is faster than the way I am pushing the lines into SQL Server. I believe the Node-Red maybe is faster in Parsing and I want to spent time to this technology.

The file is from here:
https://dblp.org/xml/release/dblp-2020-10-01.xml.gz (i have downloaded and uncompressed it in my drive)

The flow is this:

The function is this:

Is the FILE node set to read line by line?

Try putting a rate limit between the function and MSSQL (delay node set to rate limit 10 msg per second)

Lastly - you are really pushing nodejs / node-red here.

That file is mostly gonna end up in memory (even if you have it set to read one line at a a time). Ultimately, the file node is gonna read that 2.5GB line by line as fast as it can & your node-red will have many thousands of "on route" messages. Then your function node will be triggered many thousands of times in quick concession. My guess is you are hitting memory issues within both the NODEJS V8 env and the dockers setup. Some reading material

As a test, you could try splitting that file into smaller segments.

Other folk may have ideas on how you can read that file in segments (with more control to stop it being fully read into memory).

Regarding the MSSQL node - I use it 24/7 in production environment & it is very stable - thats why i suspect "other" influences causing your issue.

Yes, the xml is set to line by line.
Just tested the delay with rate, and strange things are happening.
There are passing only a few of the topics into sql even I have deselected the option drop intermediate messages... tested with 100 per second, 1000 per second, 10000 per second and some of the values are passing and then says done, while cpu is still at 90-100% after done indicator.

Pretty sure you are hitting memory limits.

The file node will just keep going. Did you see what number appeared under the rate limit before things stopped? That number is the queue of messages being generated by the file reader.

Hm, that's interesting, yes the number always is close to 100.000.
So you are saying, that mssql stops, and the xml file node doesn't have a clue about this?

the SQL node is likely getting saturated (memory / network latency / connection pool size constraints all come into play)

correct. you tell the file node to read a 2.5GB file line by line & produce a msg for each line - thats what its gonna do. nodes have no awareness of each other unless they are wired together. even then, i dont think there is a "pause" command for the file node (I might be wrong)

as I said before...