Converting CSV to mysql causes error with blank csv columns

I know this is an old thread, but it has helped me a lot.
I have a large csv file, reading it in and writing to MySQL.

The flow posted works except when I get to empty values, if I don’t set include empty values in csv node all works except I get wrong values when empty string encountered. If I set include empty string node red crashes assuming running out of memory.

Thanks harry

I've moved this from Memory issues processing and writing a large file - #6 since that topic was over 2 years old and close it.

can you provide a simple version of your flow causing the issue with some sample data?

I think I found the solution;

FYI, I wanted to paste more graphics & links but as a new user I was limited to 2.

For context the original thread eoliverio flow was a huge help.

eoliverio flow

[{"id":"df914794.c9f6e8","type":"file in","z":"2f489ad1.76f266","name":"","filename":"D:\\\\input.csv","format":"lines","chunk":false,"sendError":false,"encoding":"utf8","x":250,"y":460,"wires":[["328ac413.acc99c"]]},{"id":"8df35fa2.dbc47","type":"inject","z":"2f489ad1.76f266","name":"","topic":"","payload":"","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":190,"y":380,"wires":[["66f0794c.570df8"]]},{"id":"570fcc0b.84ea84","type":"xml","z":"2f489ad1.76f266","name":"","property":"payload","attr":"","chr":"","x":510,"y":640,"wires":[["6c9c1e41.ae2d6"]]},{"id":"799a9147.53447","type":"file","z":"2f489ad1.76f266","name":"","filename":"D:\\\\output.xml","appendNewline":true,"createDir":false,"overwriteFile":"false","encoding":"none","x":920,"y":640,"wires":[["7108c578.37529c"]]},{"id":"6c9c1e41.ae2d6","type":"function","z":"2f489ad1.76f266","name":"remove XML header","func":"var xmlheader = '<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>';\nmsg.payload = msg.payload.replace(xmlheader, '');\n\nreturn msg;","outputs":1,"noerr":0,"x":700,"y":640,"wires":[["799a9147.53447"]]},{"id":"a36b9cf8.344aa","type":"function","z":"2f489ad1.76f266","name":"... processing ...","func":"// do stuff using objects from CSV data\n\nvar payload = {\n    'data' : msg.payload\n}\nmsg.payload = payload;\n\nreturn msg;","outputs":1,"noerr":0,"x":340,"y":640,"wires":[["570fcc0b.84ea84"]]},{"id":"328ac413.acc99c","type":"function","z":"2f489ad1.76f266","name":"prepend columns","func":"if (msg.parts.index === 0) {\n    flow.set('columns', msg.payload);\n    return;\n}\n\nmsg.payload = flow.get('columns') + '\\r\\n' + msg.payload;\n\nreturn msg;","outputs":1,"noerr":0,"x":450,"y":460,"wires":[["74db8414.0aee8c"]]},{"id":"73a1bfc5.0dbdf","type":"change","z":"2f489ad1.76f266","name":"parts -> temp","rules":[{"t":"set","p":"temp","pt":"msg","to":"parts","tot":"msg"},{"t":"delete","p":"parts","pt":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":290,"y":560,"wires":[["51b5ec4f.137694"]]},{"id":"526fe89.1624a18","type":"change","z":"2f489ad1.76f266","name":"temp -> parts","rules":[{"t":"set","p":"parts","pt":"msg","to":"temp","tot":"msg"},{"t":"delete","p":"temp","pt":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":610,"y":560,"wires":[["a36b9cf8.344aa"]]},{"id":"51b5ec4f.137694","type":"csv","z":"2f489ad1.76f266","name":"","sep":",","hdrin":true,"hdrout":"","multi":"one","ret":"\\n","temp":"","skip":"0","strings":false,"x":450,"y":560,"wires":[["526fe89.1624a18"]]},{"id":"66f0794c.570df8","type":"change","z":"2f489ad1.76f266","name":"add XML header","rules":[{"t":"set","p":"payload","pt":"msg","to":"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><root>","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":360,"y":380,"wires":[["a8f14a7e.c64508"]]},{"id":"a8f14a7e.c64508","type":"file","z":"2f489ad1.76f266","name":"","filename":"D:\\\\output.xml","appendNewline":true,"createDir":false,"overwriteFile":"true","encoding":"none","x":560,"y":380,"wires":[["df914794.c9f6e8"]]},{"id":"cd5bad53.87abc","type":"file","z":"2f489ad1.76f266","name":"","filename":"D:\\\\output.xml","appendNewline":false,"createDir":false,"overwriteFile":"false","encoding":"utf8","x":840,"y":720,"wires":[[]]},{"id":"2bb5a351.28925c","type":"change","z":"2f489ad1.76f266","name":"add XML footer","rules":[{"t":"set","p":"payload","pt":"msg","to":"</root>","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":640,"y":720,"wires":[["cd5bad53.87abc"]]},{"id":"7108c578.37529c","type":"switch","z":"2f489ad1.76f266","name":"msg.complete is not null","property":"complete","propertyType":"msg","rules":[{"t":"nnull"}],"checkall":"true","repair":false,"outputs":1,"x":410,"y":720,"wires":[["2bb5a351.28925c"]]},{"id":"450cfa3.d6f2a04","type":"comment","z":"2f489ad1.76f266","name":"save and restore msg.parts because CSV node uses it","info":"","x":420,"y":520,"wires":[]},{"id":"74db8414.0aee8c","type":"function","z":"2f489ad1.76f266","name":"set msg.complete to last item","func":"if (msg.parts.index + 1 === msg.parts.count) {\n    msg.complete = true;\n}\n\nreturn msg;","outputs":1,"noerr":0,"x":700,"y":460,"wires":[["73a1bfc5.0dbdf"]]}]

Now my issue;
I had huge csv files, it was a parts catalogue, made up of 3 different csv files.
All csv files were million plus rows and some had 829 columns.

I needed to build a database so I could combine the data for each csv for a relational look up later.

Some Facts:
The csv files are from a big industry leading appliance manufacture, the csv files where "as is" you either used them or you did not. No going back asking for different format.

I had to use Node-Red client insisting on this.

Node-Red was being used with a webhook to provide part-number lookup responses to the clients system.

This example file had 829 columns, I only needed two, the Part Number and the Brand.

Issue, with the csv module, I needed to enable "include null values" however because you end up with 829 columns it would crash Node-Red.

So I made a CSV Clean up flow.

First node is a Watch (Hotfolder); inside that folder is a Complete, Error folder and a nested Watch folder call Process.

After the flow is done it drops the cleaned up csv in the "Process" folder and my csv import routine runs based on eoliverio flow

The next few nodes are simple enough; we check the Watch node trigger to see if the type is a file and that the file ends in .csv.

If it does not we simply delete and ignore.

If it is a csv file, we delay for 5 seconds for some settlement time then we assemble a unix command.

I used cut, which allows you to very quickly (within seconds) extract the column data we need from the 829 columns.
cut is a unix shell command, using cut you can extract single columns, columns ranges and multiple column ranges in a single command line.

Great article on cut https://linuxconfig.org/how-to-remove-columns-from-csv-based-on-column-number-using-bash-shell (couldn't past link due to newbie status)

msg.cmd = "cut -d, -f1,6 \"" + msg.filename + "\" > \"" + msg.topic + "Process/`date +%F-%H-%M-%S`-" + msg.file + "\""
return msg;

Above I'm cutting out column 1 (part number) and column 6 (brand), I then move the new cut file into the Process Watched folder prepending the file with a human readable timestamp.

The csv flow picks this up and processes.

The the exec node which runs the actual command is then called, then it checks for success or failure. exec will return either 0 (success) or 1 (failure) in the return code output in msg.payload.code.

On 0 (success) I move the input file to the Completed folder and 1 (failure) I move to the Error folder.

This now works and my csv processes into mysql.

Flow to play with:

[{"id":"2c987f95bab60ccf","type":"tab","label":"CSV Clean up","disabled":false,"info":""},{"id":"1d3a3a94f6ce82aa","type":"watch","z":"2c987f95bab60ccf","name":"","files":"/data/holley/attributes/","recursive":"","x":240,"y":400,"wires":[["cd2840948068f4ff"]]},{"id":"c33cec292428de38","type":"exec","z":"2c987f95bab60ccf","command":"","addpay":"cmd","append":"","useSpawn":"false","timer":"5","winHide":false,"oldrc":false,"name":"","x":1230,"y":380,"wires":[[],[],["de96a7edef080f01"]]},{"id":"5c7bca037e46374e","type":"delay","z":"2c987f95bab60ccf","name":"","pauseType":"delay","timeout":"5","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"allowrate":false,"x":800,"y":380,"wires":[["295414a778019deb"]]},{"id":"cd2840948068f4ff","type":"switch","z":"2c987f95bab60ccf","name":"Test if File","property":"type","propertyType":"msg","rules":[{"t":"eq","v":"file","vt":"str"},{"t":"empty"}],"checkall":"true","repair":false,"outputs":2,"x":450,"y":400,"wires":[["95a2ba0415fe404f"],[]]},{"id":"295414a778019deb","type":"function","z":"2c987f95bab60ccf","name":"Assemble Exec Command","func":"\nmsg.cmd = \"cut -d, -f1,6 \\\"\" + msg.filename + \"\\\" > \\\"\" + msg.topic + \"Process/`date +%F-%H-%M-%S`-\" + msg.file + \"\\\"\"\n\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":1020,"y":380,"wires":[["c33cec292428de38"]]},{"id":"68f29a8b3d06e1be","type":"comment","z":"2c987f95bab60ccf","name":"Attributes Hotfolder Clean up csv","info":"","x":250,"y":360,"wires":[]},{"id":"95a2ba0415fe404f","type":"switch","z":"2c987f95bab60ccf","name":"Check if csv","property":"file","propertyType":"msg","rules":[{"t":"regex","v":"\\.csv$","vt":"str","case":true},{"t":"else"}],"checkall":"true","repair":false,"outputs":2,"x":610,"y":400,"wires":[["5c7bca037e46374e"],["2901f4601b324b22"]]},{"id":"2901f4601b324b22","type":"delay","z":"2c987f95bab60ccf","name":"","pauseType":"delay","timeout":"5","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"allowrate":false,"x":720,"y":480,"wires":[["294874ee45e85c1f"]]},{"id":"294874ee45e85c1f","type":"exec","z":"2c987f95bab60ccf","command":"rm -f","addpay":"filename","append":"","useSpawn":"false","timer":"3","winHide":false,"oldrc":false,"name":"Not a csv so Delete File","x":930,"y":480,"wires":[[],[],[]]},{"id":"de96a7edef080f01","type":"switch","z":"2c987f95bab60ccf","name":"Success of failure","property":"payload.code","propertyType":"msg","rules":[{"t":"eq","v":"0","vt":"str"},{"t":"else"}],"checkall":"true","repair":false,"outputs":2,"x":1410,"y":380,"wires":[["f7ec4799c732374f"],["85a63e782436ae54"]]},{"id":"f7ec4799c732374f","type":"function","z":"2c987f95bab60ccf","name":"Assemble Exec Command","func":"msg.cmd = \"mv \\\"\" + msg.filename + \"\\\"  \\\"\" + msg.topic + \"Complete/\" + msg.file + \"\\\"\";\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":1500,"y":320,"wires":[["9c3f9bc7e1889577"]]},{"id":"9c3f9bc7e1889577","type":"exec","z":"2c987f95bab60ccf","command":"","addpay":"cmd","append":"","useSpawn":"false","timer":"5","winHide":false,"oldrc":false,"name":"Move file to Complete folder","x":1780,"y":320,"wires":[[],[],[]]},{"id":"85a63e782436ae54","type":"function","z":"2c987f95bab60ccf","name":"Assemble Exec Command","func":"msg.cmd = \"mv \\\"\" + msg.filename + \"\\\"  \\\"\" + msg.topic + \"Error/\" + msg.file + \"\\\"\";\n\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":1500,"y":460,"wires":[["7d6fe9a2b5853d92"]]},{"id":"7d6fe9a2b5853d92","type":"exec","z":"2c987f95bab60ccf","command":"","addpay":"cmd","append":"","useSpawn":"false","timer":"5","winHide":false,"oldrc":false,"name":"Move file to Error folder","x":1770,"y":460,"wires":[[],[],[]]},{"id":"37c0570de931f406","type":"comment","z":"2c987f95bab60ccf","name":"Check if file and ends in .csv","info":"","x":540,"y":360,"wires":[]},{"id":"deb5c39413d2c098","type":"comment","z":"2c987f95bab60ccf","name":"fix csv and rename file","info":"","x":1020,"y":340,"wires":[]},{"id":"fd4948ff9b51e71c","type":"comment","z":"2c987f95bab60ccf","name":"Move file in to csv processing hotfolder","info":"","x":1630,"y":280,"wires":[]},{"id":"20dcd686cb6233ab","type":"comment","z":"2c987f95bab60ccf","name":"Move file in to csv error folder","info":"","x":1680,"y":420,"wires":[]},{"id":"a354bb5dd1325b8b","type":"comment","z":"2c987f95bab60ccf","name":"Not a csv just delete and ignore","info":"","x":870,"y":440,"wires":[]}]

Things to do;

Monitor the completed folder so I only store the last 5 csv files incase I need to look at them
Monitor the error folder and send email to admin of failure.

I plan to add bash script and call a script to break the csv rows up into say 10K at a time.

Example: Still testing but this should cut the big csv into 9,999 records preserving the header row at the top of each one (so 9,999 records + 1 header = 10,000 rows)

cat bigFile.csv | parallel --header : --pipe -N9999 'cat >file_{#}.csv'

I hope this helps someone trying to do similar things with Node-Red

Harry

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.