New nodes for mass upload of e.g. CSV

Alkarex · 12 September 2020 19:47

Hello,
I am glad to share a couple of nodes I have made to allow uploading very large datasets (gigabytes of e.g. CSV) to a back-end (e.g. SQL database) through Node-RED, with a UI for Node-RED Dashboard.

Full example https://flows.nodered.org/flow/687918dd5cb66a3bfc2a661e15ef4237

The first node is a widget UI for Node-RED dashboard, which streams the data to Node-RED as chunks (text or binary) with proper back-pressure (i.e. waiting for the back-end to be ready before sending more data, to avoid memory exhaustion) https://github.com/alexandrainst/node-red-contrib-ui-upload
The second node is a normal Node-RED node, which is similar to the default "split" node, but more powerful: it supports a stream of binary or text chunks as input, supports back-pressure (wait for the rest of the downstream flow to be ready before informing the upstream flow to send more data), and outputs lines of text, either one by one, or by batches of lines for higher performance (e.g. when inserting in an SQL database). https://github.com/alexandrainst/node-red-contrib-chunks-to-lines

I hope some of you will find that useful. Feedback welcome!
Best regards,
Alkarex

Screenshots

ristomatti · 12 September 2020 20:06

I doubt if I'll have use of these myself but the working principle (and the code) seems well thought out and professional.

bakman2 · 13 September 2020 06:33

I like the gui idea.

I was just wondering, mysql and postgresql have the load data infile and copy x from csv (respectively) options which inserts csv data in bulk, which is extremely fast (at least in mysql) compared to separate sql inserts.

300.000+ records per second should be doable (depending on database engine type). Still should split a large csv into smaller files.

knolleary · 13 September 2020 07:57

A word of warning however, looking at the code for these nodes, I can see they access the internal runtime data structures. These data structures are not part of any published API for nodes to use and might change from time to time.

Alkarex · 13 September 2020 11:04

@bakman2 The database tools you mention are indeed good, and I use them myself, but they first require that you either have a file uploaded to the server somehow (also doubling the amount of disc space), and/or some kind of server access (e.g. SSH). Furthermore, the option I propose allows on-the-fly processing of data, e.g. for custom transformations or filtering before insertion. So those are not for the same type of workflows.

Alkarex · 13 September 2020 11:17

@knolleary Indeed! There is as far as I can see no "official" mechanism (yet?) in Node-RED for handling back-pressure in a (semi-)automated way (e.g. like piping in Bash/Unix). I believe that with simple conventions (similar to what I have done in those two nodes), such a mechanism could be made "official", and would cope out-of-the-box with larger payloads. I can gladly provide more information, a more precise proposal, and some help if you get interested

bakman2 · 13 September 2020 11:41

but they first require that you either have a file uploaded to the server somehow (also doubling the amount of disc space), and/or some kind of server access (e.g. SSH).

This is true if you use the LOCAL keyword, else you can serve a file via the http-in/response nodes to the mysql database, which it can directly read. Then you could perform transforms on the csv output to the http-response node and it would still be faster than separate insert commands.

Just my 2 cents.

Alkarex · 13 September 2020 11:57

@bakman2 I might have missed something, but as far as I can see, the HTTP-In node can only cope with very tiny files of at max a few megabytes, right? (Same for the CSV node). Furthermore, although not as fast as native commands, the suggested flow (which inserts 4000 lines at a time) has a decent (to me) speed of ~30.000 lines per second on my quick test (including network transfer to a humble remote server with non-SSD disk, and a bit ancient Intel i5 processor).

system · 11 November 2020 19:47

This topic was automatically closed after 60 days. New replies are no longer allowed.

Topic		Replies	Views
Read large JSON file (to insert to MySQL) causes Node-RED crash due to JavaScript heap out of memory General	11	1792	5 October 2020
Node jsonSQL Issues General	8	520	15 September 2020
Import CSV file to MySQL General	9	1787	8 December 2020
How fast is Node-Red? General	32	10245	16 January 2020
Query Data from influxdb via HTTP GET method General	4	1115	22 January 2020

New nodes for mass upload of e.g. CSV

Screenshots

Related topics