Setting SQLite as persistant context storage

Hi friends,

Just wanted to know if it is possible to setup SQLite as context-storage instead of built-in context storage (memory or localfilesystem). ?

As its already recommended, setting localfilesystem flush interval lower than 30 seconds will lead to SD card wear and reduce in life expectancy. Hence, reducing flush-interval is not an option.

I have also read some days ago(I think!) that flow fuse team has implemented SQLite based context storage (to avoid loss of data) meaning it must be possible somehow.

My goal is to buffer incoming data and process it to SQL server later. No data should be lost in case node-red crashes.
Though i have implemented SQLite based buffering to avoid data-loss but it is done at front-end(node-red editor). I was looking for backened solution.

Regards,
Shubham

It is possible. However, I don't believe anyone has written a suitable plugin.

That is not necessarily true. It depends very much on how often you are updating things and how you are structuring your context variables.

Localising variables will create more separate files which helps minimise writes. And even a large, global variable, if only written to rarely will not cause any real wear.

The biggest factor though is the card itself. If you have a good card that has wear-levelling (Such as the various Samsung EVO series) and has plenty of free space, card wear will not be an issue for many years.

When I was running on Pi's, I had EVO cards running for years and I never did anything special to reduce wear. I had many logs and other fast-changing files writing to those cards.

Of course, any SD-Card will eventually fail - that happens even if you don't use them in some cases. So you should always have external backups. But then SSD's and HDD's also fail.

ALL databases buffer in memory and write in batch periodically, that's how they are so efficient. A full DB engine such as MySQL may also maintain a changelog file that allows some additional recovery but no db running on an SD-Card is going to be free of potential corruption. In fact, they may well be worse since you may be able to recover useful data from a text file (e.g. a .json file) that would not be recoverable from a more complex db file.

So you would need to run your db server on a different platform.

If you have data that is absolutely critical not to loose any, write it direct to a DB. Or write it indirectly using MQTT or some other MQ. Either way it needs to be over to a separate server using something more reliable than an SD-Card for storage. And properly backed up of course.

Ok. does it mean i need to make compromise between saving card from wear or loosing data in case node-red crashes suddenly. My application requires batching of factory floor data every seconds for 1 minute before pushing it into SQL server running on seperate server. By reducing flushInterval to 10 seconds, will it mean, i'll lose 10 second of data on sudden crash. But card wear will also reduce by x10 (against 1sec flush interval)? Would you suggest this approach provided i'm using samsung evo cards

How much data is that? Save it in file context then look at the context file to see how big it is.

Why don't you write it to sql every second?

Yes. So if you can't send it real-time to a more suitable location, set up another file context and set it to a 1 second flush and only use it for that and nothing else. You can have as many different context configurations as you like, even with the same library.

No, I don't believe that is a valid assumption and certainly isn't if you have a decent card with wear levelling and plenty of spare space. I know this for sure since, when previously checking the performance on my Pi's, they ran pretty constantly at at least one write per second 24x7.

For a critical Pi-based server, you should probably factor in a card replacement every few years anyway. Mine all lasted at lest 3-5 years or more.

Yes, personally, I'd be happy with that, especially if you take the above into account.

I would say 1-2KB, approx 100-200 variables (mix of int, string, real, DT, and boolean).

Well i was doing it earlier like this. Just connect the cron-plus inject node to run at every minute and push data to SQL server every 1 second (add on RBE node to filter the repeated data)

But i thought: is it the optimum way?. Is SQL is designed for 1 sec or every millisecond insert? To get answer for this, i've turned to LLM? To which i got batching as optimum way to reduce load on SQL. But the thing is you guys are practical. Know about the up-side down of node-red and mySQL integration

Prepared statements is the way to go for efficient writing to a SQL DB. Per-second? I couldn't say because it really depends on the environment and infrastructure.

OK. Different question.

The persistent context storage is a good way to retain previous values (like good counts, bad counts, previous state (on/off)) in case Node-RED crashes. Based on your above answers, however, I can definitely set the persistent context storage flush interval to 1 second.

Earlier, I thought that SQlite can be setup as context backup to retain data, but you told me otherwise: "it is not possible at the moment." Hence, I decided to update the value in SQlite every 1 second and read from it from the node-red editor.

But was my approach of using the SQLite-node as a makeshift solution a good approach? I mean, SQL is designed for relational data (rows and columns) with a focus on join functionality. But if things work, it doesn't always mean it is the right choice of application. I don't even know if SQL is the right database for our application.

For example, I was learning MongoDB yesterday. There I saw that the type of data MongoDB saves and the type of payload Node-RED processes is in JS object (JSON format). Wouldn't it make more sense to prefer using NoSQL (like MongoDB or Redis) as a temporary buffer place rather than SQLite, as SQLite doesn't even support Arrays or Objects natively?

Adopt the KISS strategy. Keep It Simple Stupid. Do it the easy way, which is to save it to the db at every sample, so you don't need to worry about context storage or data loss at all. Only if you have a problem with load on the db server then worry about what to do about it.

It is not best if you still need access to the data in your Node-RED flows. Because then you'd either have to store it anyway or would have to retrieve it from the db again.

Really, until you get to much larger data stores (in which case, you would want a DB engine rather than SQLite anyway) or you want to make use of more complex relational SQL that might be harder and less efficient to do in JavaScript code, I don't see a lot of point in using a DB.

A JavaScript object is pretty powerful and fairly easy to deal with. If you need more complex data handling, you could always use a data analysis tool like nodejs-polars which gives you a dataframe handler similar to popular Python analysis tools.

A SQL db will be more efficient at handling certain types of table relationships like joins and at certain data calculations across tables but at the cost of additional memory and processor overheads. Don't forget that to get performance out of a DB, at least the indexes must all be in memory. If you are working on a memory constrained device and running Node-RED on it, you may already be pushing your memory quite hard. Then also, when doing lots of fast inserts, having indexing to make queries efficient actually reduces the performance of inserts as the engine has to not only update the table but also the index(es) and, for full engines, the transaction log as well.

Bottom line is that you have to go quite far to beat the native, in-memory performance of node.js variable handling.

Well, probably worth a bit of investigation, I suspect that even ChatGPT could guide you on this.

SQL DB's are "relational". That is, they use row-oriented tables (where columns are pre-defined, rows are indexed). If your data looks like a CSV file, a SQL DB should be good at handling it. Even better if your data looks like several CSV files, each with a common column.

However, if your data is primarily a stream of timestamped data values, you will probably be better off with a column-oriented DB such as the timeseries DB's like InfluxDB. Not only are these more efficient at recording streams of timestamped data but they are also really good at doing time/date-based analysis. Postgres has a timeseries extension as well if you are wanting to stay closer to familiar SQL DB engines.

If your data is super-simple key/value pairs, you might, I suppose, even look at something like REDIS.

The question really is: Why are you retaining the data? What are you going to do with it in the future? (OK, yes, that's 2 questions!)

I have no doubt that time series databases can substantially out perform a relational database, at least for writing data.

However, if you are already familiar with SQL then go for an SQL database.
My preference is Mariadb (= MySQL), but I also use and like sqlite, all on Raspberry Pies.

I have some very complicated SQL queries. I don't know if they could even be replicated in InfluxDB, but I bet an RDBMS outperforms a time series db for complex joins etc.

As a matter of interest, what were putting in a time series database that needed complex queries?

I'm afraid I don't understand the question?

I thought you were advising against using influxdb for a time series database as you had previously had to use complex queries with time series data, and that could be difficult with influx.

I was inquiring what sort of time series data you were saving that needed such complex queries.

No no, use whatever database you are comfortable with, because they all have their quirks.

Certainly if you want a time series database, Influx seems to be a good choice.
If you want an RDBMS, Mariadb seems to be a good choice.
If you want to tread lightly upon your server's CPU, SQLite seems to be a good choice.
There may even be arguments in favour of Mongodb, Oracle and (who knows?) Pick

Every record I insert has a timestamp. I don't know if that makes it time series data; I have not learned the jargon for alternative databases.

I would say that time series data is data that varies over time, and where the time is critical to the meaning of the data. So in the future one might want to draw a chart of a data value against time, or work out the mean of a data value over a time period or find the max and min of a value experienced during the production of a particular batch of product.

Probably. They are different beasts for different uses.

When doing a complex time or date-based summary calculation for example, a timeseries db will greatly outperform a relational db typically. But if you need to do a complex join on multiple relational tables then of course a relational db will work best.

If I were starting to learn db's now, I would probably stick to Postgres and SQLite. Since there is a timeseries extension for Postures (and other extensions such as geo data).

SQLite is not a DB engine really of course, it is a single-library application, it does not have the safety and security features of a full engine and largely is useful for data that fits into memory.

I occasionally think I should maybe use SQLite for some data then I realise that I can do everything in JavaScript (or sometimes JSONata) with a JSON file or 2 so I don't bother. :slight_smile:

I've not directly needed a relational db engine for anything in quite a few years. Though, of course, they are sometimes built into other tools - usually in Docker.

I only use InfluxDB because it was a bit of a pain to learn but once set up, I never need to directly touch it. Grafana has a point-and-click query interface for it which makes life easier and I have some basic instructions (that I've previously shared on the forum) in case I need to set up something new or change some of the automatic maintenance settings (auto-summarising for long-term data and auto-trimming details tables).

Anyway, we are now getting a long way off the original topic (as usual. :smiley: )

2 Likes

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.