Durable and reliable contextStorage - DB/mysql/mariadb based

I have a store and forward flow for which I use file based contextStorage. The problem with this, when the machine/node-red restarts suddenly, the underlying contextfile gets corrupted. This leads to loss of data and node-red unable to restart on its own because of invalid file (midway during write, it went down)

Is there a database based persistent storage like filestorage - has anyone used or built one for such needs.?

It should not get corrupted. However, the default update period might be too long for you so it may miss updates. You can shorten the period in the settings.

Of course, it would be better if you fixed whatever is causing Node-RED and/or your server device to crash!

I've long wanted to write a context storage plugin that uses a database, just never had the time. But there is a REDIS store I believe which may be more suitable for you.

What platform do you run Node-red on?

But isn't file storage effectively a no SQL database managed entirely within NR?

Surely safer than an external database engine?

Since it is a RPi device in a industrial env, it is shutdown/killed/restarted for various reason outside of my control.

Reason why you are suggesting Redis instead of DB is because of the latency needs?

I am not sure how light weight Redis would be in comparison with MariaDB for this purpose - my need is limited to just a queue implementation.

Saw @jbudd reco - for noSQL - I wanted transactionality (durable & reliable) so that when I write frequently and it is stopped/restarted, we should be able to restart without any issues.

No. Not really. It doesn't have all the consistency and safety features of a DB.

Not at all. Especially as your data sizes grow.

Not really. REDIS is an industrial-scale data caching service and should be more robust. But it does require an additional service to be run and might also be susceptible to sudden shutdowns.

This is a REALLY BAD IDEA for a Linux server. If you want to do that, you should be using real-time microprocessor devices. This is certainly not unique to Node-RED and is likely to cause issues for any service trying to keep a record of data.

You would be much better off sending data off the device.

They would be about comparable probably. But neither is suitable for the circumstances you've outlined.

No service will cope well with this if you are allowing the Pi to be killed. If you have a reliable network, send the data off-device.

Clearly a dbms has mechanisms to ensure data integrity which a flat file does not have. I meant safer in that it would have fewer points of potential failure.

Do you envisage a dbms backed context store providing a mechanism to start a transaction, write the data and commit, all at different points in the flow?

If the environment is a non-realtime OS and machines or Node-red get restarted without notice its going to be very difficult to guarantee integrity.

ps I did not recommend noSQL.
I easily run MySQL on a Pi 4 and SQLite on pi zero 2s. I know SQL too and see no reason to learn another not quite SQLanguage. Looking at you MS, Influx :grin:

1 Like
  • In spite of all Backup/UPS etc, it(restart) does happen - the cost we pay is huge - data loss until we know that the node-red couldn't come up automatically.
  • Store and forward is my use-case (long story...short). Network is reliable or not - wanted this as a feature.
  • I am not sure Redis can offer more robust than say mysql in the context of durable+reliable+w/ restarts. I would be keen to know why you say that.
  • Sending off the device is my priority when it is healthy - 99% of time my context queue size will be zero but this <1% is damaging (data loss & unable to restart when context file is corrupt)
  • I thought persistent storage would ensure the context is consistent always - so that restarts are dependable. I don't want to be writing a DB based queue outside of contextStorage because that would mean maintenance of lifecycle of the DB/Table(s).

Can you identify exactly what caused an example of data loss?

Is the huge cost you pay more than employing a resilient systems expert?

@jbudd
Sure - let's say RPi restarts - takes 2 mins - ideally there should be no data only for this period and is manageable in the way we handle things. When it restarts, node-red doesn't restart because of the incomplete file resulting in minutes to hours to sometimes days of no data captured and sent to the server. (Even after realizing that it is not sending the data, getting remote access to the device in the site and fixing the context file to restart could take days)

Hence I am looking for a DB based context storage which could help.

Ok but what caused the Pi to restart?
A power outage, incorrect use of fingers, it's own hardware watchdog?

I'm more keen it this, why is node red not restarting? Is it something specific to your flows?

Ok but what caused the Pi to restart?
A power outage, incorrect use of fingers, it's own hardware watchdog?

Multiple reasons- power outage, UPS given up, maintenance, internal audit - they shutdown power, long weekend - power saving..all kinds of issues that are outside of our control. :slight_smile:
The point is when it is switched off - it is fine to have no data flowing - when it is on the data should start flowing is the expectation.

1 Like

Sounds like you need a method of detecting this. For example subscribing from your location to MQTT LWT messages brokered on the Pi

(Did we not once upon a time discuss having the context sent to an MQTT broker w the retention flag set? Then at reboot, the last known context would be retrieved (provided the MQTT broker did not go down in mean time)

Sorry this was for a state discussion, how to recover last known states

Sounds like you need a method of detecting this. For example subscribing from your location to MQTT LWT messages brokered on the Pi

I have to humbly state that we are digressing from the root of the issue. I have already explained above that we already know that the data is not coming and why it still takes days to reach out to the device to restart.

1 Like

Fair enough.

I just feel that if your operating environment, including UPS, does not protect the Pi from a catastrophic shutdown, nor does it start up again without intervention, adding a database to the mix is unlikely to resolve the issue.

However, without much, much more information on your systems, flow design, the source, content and destination of data I'm only guessing. Which I will stop doing.
Best of luck :grinning:

If you have an unreliable platform, you need something with the lowest possible latency. REDIS is designed for that kind of thing - though not really on-server, more usual, the REDIS server would would on a low-latency network connection.

REDIS is a caching store rather than a db and so is generally much simpler and lower latency in use.

Personally, in the situation you are in, I would output things as quickly as possible to MQTT with replication to an external MQTT service. That would likely give you pretty reliable, low-latency, replicated data.

But, of course, much depends on the data, the flows and other services performance.

I would still want to know why that is corrupting? Is it that the data is too large and therefore takes too long to write the in-memory JSON to file (which is how persistent file storage works)?

Also, if you are getting corrupted data writes and you are using a Pi with SD-Card storage, there is a very good chance that the card itself is damaged due to unclean shutdowns. Even when using SSD or HDD, you are quite likely to get filing system corruption if not enforcing clean shutdowns. A regular disk check will be needed.

I do think it does what it can. It uses the standard Node.JS filing system (actually I think it uses the fs-extra library but it is the same thing) which already tried to be as reliable as possible. But filing systems are filing systems - if the power goes off 1/2 way through a large update, you are probably going to get a corrupted file.

So also check the size of data you have in a specific context variable and possibly look to split it into smaller chunks if possible. Or send break it down and send as chunks to a Mosquitto broker with retained messages. It is possible to model a JSON structure to multiple topics fairly easily

If you need to guarantee data capture, Node-RED is almost certainly not the correct tool to be using. It is too generic.

To reitterate. There are two issues for you here.

  1. If general-purpose compute devices have unclean shut downs, you HAVE to check the filing system on restart. This is especially critical if using something as unreliable as an SD-Card.
  2. If you absolutely must have data being captured immediately after startup of the server, it is unlikely that Node-RED is the correct architecture.

No but every DBMS keeps data in-memory until it commits the cache. Wrapping a DB write in a transaction takes explicit control of that. But even a DBMS, if for example power is cut to the server, could easily end up with a corrupted record. That is one reason that many DBMS's also have a simple change log - the simplicity of the log record reduces the chance that it too will be corrupted.

Which is exactly the point I'm trying to get across. You can REDUCE the chance of corruption by having a lower-latency service. But you cannot ever eliminate it. Especially if (and we don't know if this is the case here) you are using an SD-Card on a Raspberry Pi.

This is NOT a Node-RED issue. This is an architectural design issue.

The issues could be reduced or eliminated by attaching a small battery backup to the Pi and putting the Pi somewhere where ignorant people can't physically turn it off. The UPS would signal the power loss though and tell the system to shut down cleanly. Corruption problems dealt with immediately. No need for anything more complex.

If you want to have data recorded when the device comes up - don't use Node-RED which will inevitably take time to start up. But if you do use Node-RED, your process has to take into account that if the service doesn't start then you can't have data flowing.

So if data flowing is critical - you need an additional monitor to tell you when it isn't flowing.

These are architectural design issues for what is clearly a commercial system.

Databases are not going to help this problem and might even make them worse because you will have even more writes happening when the system is killed.

It is edge device that collects data from a Industrial shop floor hence the nature of the device and the ambient condition.

To the level I understand this far, node red is a reasonable choice for this use case. It's too late to question that for me in any case.

I see the problem with the file based context file storage. It is unable to recover with incomplete context file and not sure unless it is a database this could be overcome. There is nothing like partially written file in DB, even if there is one, WALs and undo logs helps recover those. I wish someone has implemented a DB based context storage. Even if there is a Redis based context storage, I can try the same- couldn't find any.