Node-RED freezes while sqlite query

Im using the latest node-red-node-sqlite for querying my database. Unfortunately just realised, that the query totally blocks whole Node-RED ... which in some cases can take tens of seconds, even minutes. As the flows also serves to add new realtime data to the db, this is a huge problem.

Is there any solution? Maybe I can try to optimize my queries to be faster, but still, blocking whole NR is unusable.

Currently running Node-RED v3.0.2; Node.js v18.7.0; node-red-node-sqlite v1.1.0

Thanks for reply

Have you outgrown sqlite? What size if the database?

Do you have suitable indexes on the fields? Which kind of queries take the longest?

This resource will help you understand the above: EXPLAIN QUERY PLAN

DB size is about 700MB ... with only two tables, one of which is very small, about 120 rows, second is quite large - about 1.4M rows.
The queries are always selects only from one table, no joins. Selecting by column timestamp, which is not indexed ... I have transferred some of the queries to select by id, which is indexed, but nome queries need to be selected by timestamp. This is probably the reason, why it takes so long.

But the main problem I see is fact that during query is whole Node-RED stucked ... so the data I receive over MQTT are lost.

Are you sure node-red is blocked? If you add a debug node on the output of one of the MQTT In nodes does the data stop updating in the debug pane?

First I tried simple flow with inject every second and debug ... I was getting msgs every second, then I triggered the query and messages stopped coming ... after 33s they started to show again every second, but the events in the meantime did not get triggered.
Then I tried to listen the mqtt node as you suggested ... while query, no msgs show, and when the query finishes all msgs, that should come in meantime, show together at the same time. So atleast the data should be stored eventually (not in the case when I have to restart the server while query is going), yet they will have shifted timestamps probably.
But the problem is also that Dashboard gets freezed and node-red editor also gets freezed ...
Thanks for your time

It sounds like javascript's main thread is occupied with processing the reply from your sqlite query .. and that delays everything else from executing.

How much data are you requesting from sqlite that it takes 33s ? I mean .. when the query executes how many rows are you getting back ? Can you refine your sql to request less data ?

Also try disconnecting the output of the sqlite node so that you do not do any processing on returned records, to rule out the possibility that it is your flow processing the results that is the problem.

1 Like

Yes you have almost certainly identified the problem.

Define another index on the timestamp column.
It might take a while to build the index for 1.4 million records.

1 Like

TBF, 1.4m records is not a HUGE amount of data for a database and an index will DEFINATELY help (and the OP should definitely add one) however, I suspect @UnborN is onto something - how many records is @mima232 receiving and processing?

It is not uncommon to see folk request all records > datetime then proceed to loop through them all in a tight JS loop looking for something (and effectively blocking the Node eventloop)

If this is the case, then an index will be immaterial since the after processing would still be blocking.

Agreed, indexing and appropriate WHERE clause (which might imply further indices) are both vital.

Maybe the OP will share their query and EXPLAIN QUERY PLAN results for it.

Thank you all for your responses

This is the flow:

I placed the debug nodes to check if the freeze is really in the sqlite node (atleast I thought that) and the second debug activated after unfreeze. So I was sure it must be the db node ...
But, as suggested by Colin, I tried to disconnect output to the function 16 ... and it ran without freeze!
Function 16 indeed includes a loop iteration but that was not the problem. It seems the freeze happens when the csv parser node converts the data ...
Function 16 only replaces timestamp by string.
I did not know that one node can freeze whole nodered ... I thought that theres some paralelization even inside the nodes, so when some node takes too long it gets interrupted ...

I will try to add the indexing for timestamp column, that could help a little, but the main problem now is probably the string conversion.
The query looks like this:
"SELECT * FROM table WHERE name like 'dev_name%' AND dt>1703342762 ORDER BY id DESC"
The flow serves for data export, so I cant refine the query more, than user wants.
The amount of rows can be anywhere from hundreds to tens of thousands.

This is can be a heavy process for NodeJS. Especially tens of thousands.

That said:

  • what version of Node-RED are you using?
  • How many columns are in the data?
  • can you share the DDL of the table?

There is a good reason for asking.

Recent versions of node-red (v3.1.x) have a significant speed increase when parsing large arrays to CSV data

1 Like

Im using Node-RED v3.0.2 as stated above,
the table was created with this (hope its what you asked for):
so there is 6 columns, data column can contain relatively long strings something like 700 chars, sometimes even 2400 chars, dont know if its concern or not

I believe you will see significant processing speed up if you update to latest V3.1.3

  • CSV: make CSV export way faster by not re-allocating and handling huge string (#4349)
1 Like

Looks like dt might be a timestamp? If so, you should declare it as such.

Also, using a pre-defined declared query would speed things up since repeat queries would not need to be reinterpreted.

Can you also share what kind of server device you are running Node-RED on? Specifically how much memory? You should always check for memory issues when getting unexpected blocks in process. There are a couple of things that can cause this that are memory related.

Paging (or swapping) is something that the OS does when running short on resources and this can stop other processes temporarily, especially on devices with poorly balanced hardware (such as a Raspberry Pi providing storage from an SD-Card).

Then Node-RED itself runs using Node.js, as it is JavaScript based, it has to do occasional "garbage collection". In some cases, this also can cause noticeable pauses. This is less likely in this case as the pauses are generally much shorted.

Im not really much familiar with DBs so I dont know all these tricks ... but as you mentioned predefined query, I think thats not solution, because users can widely modify the query parameters.

The machine is a NUC-like pc ... not exactly a server, but not a Raspi ... it has 16G RAM of which is used only 1.7G; CPU is some 4-core Pentium; 250G SSD. So I dont think swapping should be issue here.

I tried to replace the csv node with simple function node with forloop, concatenating data (comma separated, new line ended, strings quoted) and got result in about one second instead of ten second. So the csv node indeed is unefficient in older versions of Node-RED.

I have one more question ... until now I thought that all flows in Node-RED run simultaneously ... well ofc not exactly hw parallel, but that there is some task scheduling which ensures all flows get same cpu time so there cant happen that one flow will stop whole system ... which now seems to be happening.
So how exactly works the scheduling? Is the preemption only between nodes and not inside of them? I guess thats nature of Node.js, not Node-RED, right?

NodeJS is (mostly) single threaded.

There are tons of good articles floating around - this one is reasonable: Is Node.js Single-Threaded or Multi-Threaded? and Why? - DEV Community

1 Like

That isn't how Node.js (or JavaScript in general) works. It uses a single task "loop". However, JS is very clever at breaking things down into small tasks, each micro-task holds up the loop but the idea is that they are so small that the low-level scheduler lets lots of micro-tasks appear to work in parallel. Understanding the loop in detail is quite complex and I'd say that most JS programmers never really bother - or need - to know the inner workings.

However, there are certainly things that can catch out the unwary but for the most part, as a Node-RED user, you don't need to worry about it.

Thanks for that article, now I finally understand the working principle.

My friend found this blog post: Making flows asynchronous by default : Node-RED which explains a lot. But Im still not quite sure where are the borders of macrotasks in Node-RED. According to the post it seems that every node is a macrotask since Node-RED 1.0 and every branch should be processed "parallel" (I mean this order: 1st node from 1st branch; 1st node from 2nd branch; 2nd node from 1st branch; 2nd node from 2nd branch ...).

So why in this flow it gets stucked on csv node without printing debug 31 ? Expected order should be: db node, function 16 node, debug 31 node, csv node ... right? I have seen this behaviour many times before but according to the async behaviour described in the blog post above it should not do this ...

Another thing Im curious about is whats the order of branches? Is it decided by order of links in flow.json? Does it change when I delete the links and reattach debug node first and function 16 second?

Thanks for explanation