Durable and reliable contextStorage - DB/mysql/mariadb based

Well it isn't a design choice I would have made except perhaps as a prototype. But, as you say, perhaps too late to change now.

It is, of course, possible that a DB backed context store might help. But I wouldn't want to bet heavily on it.

Anyway, context stores aren't that hard to write, someone could take the REDIS version as an example and rework for a different DB. I'm Sure GitHub Copilot would be delighted to help! :slight_smile:

But if you are stuck with your current design and don't have node.js development skills to hand, I would say that you should try decomposing what you are pushing to context and outputting to MQTT with retained messages. The chances of those being corrupted when working with Mosquitto are tiny. You will need a flow to decompose the data (I'm assuming that data is JSON - you haven't told us anything about the real nature or size of the data) and another to recompose it on startup by subscribing to the retained message structure.

1 Like

I do not understand why you would be using a RPi for something that sounds pretty important ? If it taking too long to startup then move to an industrial PC with an NVMe SSD

Change the underlying filesystem to something like BTRFS or something else with more inbuilt resilience, get a better/bigger UPS dedicated to the machine so it never runs out of power.

You are looking at trying to fix the symptoms of the problem rather than the underlying problem

Craig

3 Likes

I'd go for SQLite rather than REDIS as backend for context storage. Same goes for any other server-based DBMS like MariaDB.

You just add another point of failure to the mix. You'd have to configure it properly, so persistence works correctly. The service has to be available when starting Node-RED, so the context data can be read.

So here are some advantages I see for SQLite in this case (regardless of being just a file)

  • supports transactions (ACID)
  • pretty resilient regarding corruption
  • it's fast and in-process, so no added latency
  • no additional service that has to be maintained

I use it a lot as local storage for config and runtime data in micro-services to keep the data close to the process.
I think that would make sense for Node-RED's context storage, too.

Just my 2ct... :nerd_face:

Is the context file storage on SD card? If so then that may well be a factor in the problem.

MQTT only "retains" the most recent message per topic, so this does not prevent messages being lost while Node-red is unavailable.
I think the OP has said this lost data does not matter though, as long as Node-red reliably starts up when power is reapplied.

He also says that if the Pi goes down while context data is being written, Node-red will not restart due to a corrupt context file.
It does seem to be a significant problem for Node-red if it can't recover from a corrupt context store.

What should the recovery mode be?
It may not be appropriate just to restart with default data in case that would mess up something.

1 Like

Good question.

One can imagine different strategies, maybe configurable - discard that context data, disable that flow, disable all flows.

It would be nice to find a way to emulate the situation and explore.

I am waiting for a reply to my post asking if the data is going to SD card. It is a known problem with SD cards that when the software completes writing to the card the operation may not yet have been written out (hence the need with usb sticks to Eject the card, that waits for the operation to be completed before unmounting it). If the power fails after node red has written the new data, but the card is still in the middle of writing it, then it may be corrupted.

I suppose one solution would be to leave a backup of the old data each time new data are written, then on failure node-red could load the previous set of data.

1 Like

Indeed, and not just SD cards but SSDs and even hard disks.
I believe that with new fangled solid state storage not even the sync command guarantees a write.

That's what I said already.

Assuming that the hard disc is formatted with ext4 or similar, or Windows equivalent, which use transactions when writing data so that it can recover on startup, then it should not be an issue there. If you look in syslog you will see it checking the disk.

I assumed that SSDs have additional s/w on board that accomplishes something similar to cope with the delayed write. Is that not correct? I will have to do some searching about that. Certainly I have never had corrupted data on an SSD due to sudden power fail.

My understanding is that no filing system can be totally free from corruption if the power is just cut. Some are better than others about minimising the risk.

Generally, writes are so quick it is rarely an issue and filing systems have improved greatly over the years such that, for most, you won't ever see it. But it doesn't mean it doesn't happen. I've seen this on both Windows and Linux over the years where the corruption was hidden until you ran a full filing system check.

It isn't just power loss that can cause this of course, but also an OS crash. Rather more common on Windows than Linux though. :slight_smile:

I suspect, but don't know for sure, that is as much to do with the write speeds than anything else.


Again though, the point is that it is always possible to get corruption - even cosmic rays cause eventual storage degradation.

Using a journaling filing system such as Windows NTFS or Linux Ext4 reduces risk in the same way that journaling DBMS's do. Having small, fast journal writes as a backup.

Keeping Node-RED's context variables small would also help. But remember that you are converting what may be a JavaScript object into a flat text file every time that Node-RED writes to disk. If you have a lot of fast updating and potentially large data, it is only written every 30 sec by default - that could be a lot of writes. Coupled with a slow write speed, you are really starting to push the boundaries.

Deconstructing the data and outputting to MQTT results in a more robust set of data since any sudden system loss is unlikely to cause physical corruption - though, of course, it might result in somewhat inconsistent data. Even so, Node-RED would be able to come back up.

The issue here is, if you can't make the platform more reliable, and you don't want to change your tooling, you have to change your flows to reduce the risk of data corruption. The exact "best" way is going to depend on details we don't know. MQTT would be one possibility. Use of a DB output might be another - you don't have to use context storage after all. Reworking the data would certainly be something I'd want to look at as well.

Context storage "to MQTT" using persistence might be a useful extension to Node-red?
Easy to use to pass context to an external database too.

context.set (topic, mydata, broker)

mydata = context.get(topic, broker) ?? "Empty context store"

I've thought about that in the past. The only downside is making sure that you decompose JavaScript objects into meaningful MQTT topic structures and working out how to reliably rebuild the object from the topics. It is fairly taxing. :slight_smile:

All.
My use-case is IoT data capture at the edge side - small sized data, frequently(1 - 3secs) captured and sent via MQTT with a store and forward mechanism to handle network failures.

This problem happens with Windows (w/ or w/o SSD) and Linux/ubuntu PCs too where the same flow is deployed. Hence it is not limited to RPi+SD card.

As long as the filesystem contextStorage being stored as JSON, this can happen because shutdown can result in incomplete JSON being written. FileStorage can solve this may be using a dual file write method where atleast one file is complete/non-broken at any point in time.

Databases typically solve this using WAL (undo logs - pure append log) for recovery. Hence I am looking for database based context storage.

May be I can share a simple reproducer that you can deploy anywhere to reproduce this problem if you are not fully convinced. :slight_smile:

I am putting together a DB based context storage based on the suggestions above - hope that addresses this issue. I am surprised that others haven't faced this issue.

If you publish msg.payload, isn't that effectively just a string?

I suppose when you subscribe via context.get() you could get back the payload you saved there but theres nothing to stop some other published message from overwriting it, which might not be friendly to a json parser node.

If you don't care about the context values then use in memory context... at least then you know it will be gone, but start clean.

1 Like

No, it could be anything. But in any case, if you were to use MQTT as a context output, you would need to be able to deal with anything. Simply serialising JSON would leave you with the same issue that we already see, so deconstruction would be needed to avoid that and to make the data actually useful in MQTT itself as well.

Sadly, it doesn't exist. So if you can't find someone to write it, you will need to look at using standard DB nodes instead I'm afraid. Use stored procedures and, if needed, transactions to enforce consistency.