Read large JSON file (to insert to MySQL) causes Node-RED crash due to JavaScript heap out of memory

I have been at this for a solid 7-9 hours, so will try and keep it brief....

The big picture.
I have a 27.3 Mb JSON formatted file that I need to take each key and its data and insert into a MySQL database.
The data roughly looks like this;

"000005": {
"r": "5N-BXF",
"t": "E145",
"f": "00",
"d": "Embraer ERJ-145-LR"
"001000": {
"r": "5B-CLD",
"t": "SIRA",
"f": "00",
"d": "P.2002 Sierra JF"

The database is pretty clean and simple. (I think).

I have a small test file that I use to prove out the flow (but does not cause Node-RED to crash with the memory error).
You can see it here;

My platform is a Windows 10 PC (32gig RAM and 1T SSD).
I am currently on Node-RED v 1.0.6 and Node v10.16.3

Things that I have tried.
I have, as per @knolleary's post here; Reading file line by line Set the file node to output line by line and then use the delay node in rate limit mode to try and slow things down in order to give the garbage collector time to work. Unfortunately for unknown reasons the rate limit does not work and the file is simply read and processed as fast as the flow can manage.

I have also tried using the bigfile node, tested many different combinations of its file chunking etc, all result in the same crash.

I have also tried running with different values of --max-old-space-size=8192 and there is no difference in the results.

I have also tried to break the inserts down into batches.
The flow is a lot more complicated and made zero difference to the crash occurrence.

The crash.
The error Node-RED throws in the same error consistently.

<--- JS stacktrace --->
==== JS stack trace =========================================
    0: ExitFrame [pc: 000002597555C5C1]
Security context: 0x01660249e6e9 <JSObject>
    1: /* anonymous */ [000002194C5ECA39] [C:\Users\bmorc\.node-red\node_modules\node-red-node-mysql\68-mysql.js:~123] [pc=0000025975C2597E](this=0x02194c5db921 <MysqlDBNodeIn map = 0000022D78CBD561>,msg=0x0390de838d79 <Object map = 00000135EC82D231>)
    2: arguments adaptor frame: 3->1
    3: /* anonymous */ [000000A816A746B9] [C:\Users\bmorc\AppData\Roa...
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

My current flow.
Here is the flow as it currently stands.

No debug nodes in the screenshot or exported flow, but I can assure you I have extensively used them to prove out that the data is not getting mangled at any point.
Also I ensure that the debug nodes are turned off when I test the large file other wise the debug tab and terminal that Node-RED is running in just floods with (pointless) messages.

[{"id":"b3b3c39c.49d608","type":"inject","z":"9d3a3718.5bab6","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":260,"y":1360,"wires":[["86f6856c.6f4748"]]},{"id":"86f6856c.6f4748","type":"file in","z":"9d3a3718.5bab6","name":"actest.json","filename":"C:\\Users\\bmorc\\Downloads\\indexedDB\\actest.json","format":"lines","chunk":false,"sendError":false,"encoding":"none","x":480,"y":1360,"wires":[["8643c120.7fe9f8"]]},{"id":"50a19f4.81926e","type":"split","z":"9d3a3718.5bab6","name":"","splt":"},","spltType":"str","arraySplt":1,"arraySpltType":"len","stream":false,"addname":"","x":870,"y":1360,"wires":[["62b2d893.f7f2a"]]},{"id":"d93a7d81.e4dc88","type":"change","z":"9d3a3718.5bab6","name":"remove unassigned","rules":[{"t":"change","p":"payload","pt":"msg","from":":{\"r\":","fromt":"str","to":"","tot":"str"},{"t":"change","p":"payload","pt":"msg","from":",\"t\":","fromt":"str","to":"","tot":"str"},{"t":"change","p":"payload","pt":"msg","from":",\"f\":","fromt":"str","to":"","tot":"str"},{"t":"change","p":"payload","pt":"msg","from":",\"d\":","fromt":"str","to":"","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":1230,"y":1360,"wires":[["4f016729.34ad2"]]},{"id":"62b2d893.f7f2a","type":"function","z":"9d3a3718.5bab6","name":"get colums","func":"msg.icao = \"\";\n    //skip = msg.payload.indexOf('\"');\n    start = msg.payload.indexOf('\"')+1;\n    end = msg.payload.indexOf('\"' ,start);\n    icao = msg.payload.substring(start,end);\n    msg.icao = icao;\nmsg.reg = \"\";\n    skip = msg.payload.indexOf('r');\n    start = msg.payload.indexOf(':', skip)+2;\n    end = msg.payload.indexOf('\"' ,start);\n    reg = msg.payload.substring(start,end);\n    msg.reg = reg;\nmsg.type = \"\";\n    skip = msg.payload.indexOf('t');\n    start = msg.payload.indexOf(':', skip)+2;\n    end = msg.payload.indexOf('\"' ,start);\n    type = msg.payload.substring(start,end);\n    msg.type = type;\nmsg.interest = \"\";\n    skip = msg.payload.indexOf('f');\n    start = msg.payload.indexOf(':', skip)+2;\n    end = msg.payload.indexOf('\"' ,start);\n    interest = msg.payload.substring(start,end);\n    msg.interest = interest;\nmsg.dist = \"\";\n    skip = msg.payload.indexOf('d');\n    start = msg.payload.indexOf(':', skip)+2;\n    end = msg.payload.indexOf('\"' ,start);\n    dist = msg.payload.substring(start,end);\n    msg.dist = dist;\n\nreturn msg;\n","outputs":1,"noerr":0,"x":1030,"y":1360,"wires":[["d93a7d81.e4dc88"]]},{"id":"5080b340.b5f364","type":"mysql","z":"9d3a3718.5bab6","mydb":"69b2048c.dfd654","name":"aircraft","x":1630,"y":1360,"wires":[[]]},{"id":"4f016729.34ad2","type":"function","z":"9d3a3718.5bab6","name":"airframes to mysql","func":"msg.topic = \"INSERT INTO airframes ( icao ,reg, type, interest, description) VALUES('\"+ msg.icao +\"','\"+ msg.reg +\"','\"+ msg.type +\"','\"+ msg.interest +\"','\"+ msg.dist +\"')\";\n\nreturn msg;","outputs":1,"noerr":0,"x":1450,"y":1360,"wires":[["5080b340.b5f364"]]},{"id":"8643c120.7fe9f8","type":"delay","z":"9d3a3718.5bab6","name":"","pauseType":"rate","timeout":"1","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"x":710,"y":1360,"wires":[["50a19f4.81926e"]]},{"id":"69b2048c.dfd654","type":"MySQLdatabase","z":"","name":"","host":"","port":"3306","db":"aircraft","tz":""}]

Your time to review and offer suggestions is gratefully appreciated.

In that flow the rate limit should limit the rate messages are passed on, but it will limit the rate at which the file node reads the file, so it will read the whole file and the messages will be buffered up in the rate limit node, which may cause you to run out of memory.
or do you mean that if you put a debug node looking at the output of the delay node you do not see one message/second?
If you break the flow after the delay node and feed it to a debug node, so all it does is read the file and output 1 line per second, does it still crash?

I think you need to use something like bigfile so you can read just a portion of the file at a time.

Thanks for your comment Colin.

Node-RED crashes the same with or without the rate node.
I left it in the exported flow, but have extensively tested it both ways, it simply does nothing (that I can tell).

The debug time stamps coming out of the rate node all have the same timestamp. The debug tab fills up very quickly with all the nodes messages.

I have not tried sending the 27 meg file to the debug tab to see if it still causes the crash, interesting idea.

As I mentioned in my post, I have also extensively tested a LOT of different output options with the bigfile node. Again, no difference was found with any of the crash reports. The bigfile node like the delay node seems to do nothing over the core file node.

Maybe you can convert the json to csv, store it in a file and then import this file in the DB (maybe first in a temporary table and then from temporary table to final table.)

I did not consider that the JSON format may be an issue.... Very interesting comment.

I cant put it into a temporary table since it crashes during the insert, so temporary or not, getting it into the db is the core issue. (Unless I misunderstand your train of thought).

Personally, I would not use Node-RED for this task at all as you suggest. Well, maybe to run the batch file.

Do a bulk upload to MySQL direct from the CSV using MySQL command line tools.

If a direct upload from CSV->MySQL crashes, that is a different problem.

Eliminate Node-RED first.

@TotallyInformation I have considered running it as a batch job.
The challenge is that we have a few of us that need to do the same file -> insert, some are on Windows, some are on Linux and all of us have different skill sets.
The idea of automating the file read -> insert with Node-RED is attractive as I just need to write the flow, share it with my friends and its job done.

Will consider the CSV process.

That is why you might want to use Node-RED to orchestrate the activity.

If you can run the batch file on the MySQL server (probably running Linux?) then the task becomes more about transferring the file. If you can't do that, you will need 2 batch files, one for Windows and one for Linux. Actually, they will likely be almost identical since all you are really doing is running the MySQL batch upload. So translation from one to the other will be trivial even with minimal knowledge.

You could go further and either use PowerShell on both platforms or use the Windows Subsystem for Linux (WSL) on Windows. Either will give you a single execution environment on both platforms.

Just had a quick look at the flow. You are parsing the json file as a big text file.
I am even thinking that the "limit 1 msg/s" will see this json file as a single message of 27.3MB.

I would convert the json text file to a json object first by using the "json" node, then you can apply the "split" node followed by the insert logic to do the insert record by record.

Optionally you can add the rate limit node after the split node.

Bulk inserts might be faster when doing it from a csv file (see previous comments).
You can do this from node-red using exec node.

1 Like

@janvda Nice take on the problem.
I did have the json node in a flow I tested, but not with the rate limit node.

Got some hours to go, but its really looking good so far.

Thanks so much. Looks like we are going to get there!

EDIT. 12 hours latter and all my data is correctly inserted. No Node-RED errors.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.