Working with local storage for Context, Flow and Global Variables

I am moving a discussion from this thread:

To this new one, as through some help it has become apparent that the topic has moved from being about Node-RED not starting, to working with storing data locally, and I hope it may help people later when they search for the same topic.

Summary of the topic so far:
A) I was reading a flow variable into a function node and then making changes to the variables, as per the image below, but I found that on a restart of node-red that all my data changes were lost:

node

B) @Colin
Pointed out the following:
"Because the thing you are saving in context is an object, when you do flow.get it gives you a reference (or pointer) to the cache of the object from the file in memory. When you change a property of that object it changes the version in memory so appears to work fine. It is, however, the flow.set() that tells the s/w that you have changed something so that it knows to write the cache out to disc."

C) Key Finding (for me at least)
The short of it is that if you update variables in memory, they will not be written down to disc. To write to disc you need to use flow.set, context.set or global.set.
This is a really important finding, for those that have not figured it out yet

You can read the thread above to understand what I did as a quick fix to get my flows working. The school boy error I made in trying to write the flow variable back using flow.set and the code that @colin suggested I use in future.


D) Now it gets even more interesting. with a discussion on possible alternatives on how you use the .set command:

D1) @TotallyInformation writes this:

"But you don't have to get/set the whole of an object when using Node-RED's variables.

Here is a quick example:

var env = {*
*    "args":"--userDir,./data",*
*    "automation":"true",*
*    "autorestart":"true",*
*    "windir":"C:\\WINDOWS",*
*    "windowsHide":"true",*
*    "_prog":"node"*
*}*

*context.set('env', env)*

*context.set('env.automation', 'false')*

*const env2 = context.get('env.automation')*

*msg.payload = env2*

*return msg*

You can use this form as well: 'env["automation"]'

Generally, I do the same as Colin though because it makes the code easier to read. But if you want a bit more efficiency, this is the way to go."

D2)@Colin's reply:
"Are you sure it is more efficient? Does context.set do a clone or just set a reference?"

D3) @TotallyInformation's reply

"In theory, once a reference is finished with, the garbage collection routines of node.js can recover the space. So even if something like flow.get('xxx.property') gets the full object as it only returns the individual property, the rest can be released once the function finishes.

Least, that's what I think might happen. But, as I say, I am no expert so I might be completely wrong."

D4) @Colin's reply
"We know that flow.get for an object does not do a clone, as it is not actually necessary to do the flow.set at all after modifying properties, so I don't know what there is to be garbage collected whichever way it is done."


E) So this all left me thinking last night, how does flow.set, context.set and global.set really work?

I was left with more questions than answers, some of which I think are answered in part above, however I would like to know more

Statement: When I read a variable into memory e.g. flow.get through a function node, presumably the variable does not persist beyond the function running

Question 1A - how is the use of a .set command used to trigger a write to disc
Ok, so this may become too technical, but if the answer is straight forward, I'd like to understand how it works. Does the trigger actually happen when the function node finishes running, which would imply a much shorter than 30 second interval for writing to disc? Is there some sort of queue?

Question 1B - how long after using a .set command is data written to disc
This is linked to Q1A
When I use the commands flow.set, global.set and context.set, when does this trigger a write to disc?
The documentation I have read says write to disc happens at a minimum of 30 second intervals, however when I was testing yesterday I saw it happening after a few seconds of using flow.set

Question 2A - is the whole variable written to disc or is it possible to write only the data that has changed
This is all about being efficient in writing your code. If the 2x different methods that @Colin and @TotallyInformation have discussed have different outcomes, I'd like to know.

Question 2B - what is the most efficient way of update variables
This is linked to Q2A
This is the conclusion of the discussion between @Colin and @TotallyInformation
I'd like to know which method to use and if it is not straightforward which method to use in which case

Question 3 - are there any other considerations that would impact performance / memory usage
This is a catch-all. Currently my variables are reasonably small, but I am planning to collect bigger sets of data for visualisations. Some of this data may be stored in Node-RED variables, in which case I want to make it run efficiently, some may go to a local DB, in which case I'll probably have other questions/research to do at a later point to make that work efficiently.

It's very easy to ask questions, somewhat harder at times to answer them. If I can help in anyway through testing or something else, please suggest how, else I can wait. Rome wasn't built in a day.

Actually that may or may not be true depending on what you are getting.

If you get a simple variable such as a number then it is read into a variable in the function and yes, when the variable goes out of scope, or at the end of the function node, it will be discarded.

If, however you are getting an object then, as I understand it, it does not read the object into that variable, it just sets the variable to be a reference or pointer to the object that already exists in memory, in the cache in the case of file storage. So when the variable goes out of scope then all that is discarded is that local pointer, the actual object is unaffected.

@Colin
Thanks for the prompt response. That is interesting to know.

How do we get to the bottom of the other questions about timings of writing to disc and efficiency of writing/updating variable values?

When you call set on the file system store, if its in synchronous mode, it updates the in-memory cache, then, if there is a pending write already queued up, it just returns. Otherwise, it schedules a write to disk of the cache in 30 seconds time (or whatever you've configured its flush interval to).

That means:

  • if this is the first write to context you've done since the last flush to disk, then the write to disk will happen in 30 seconds time.
  • if you'd done a write to context 28 seconds ago (that was itself the first write since the last flush), then the value you've just written will be saved to disk in about 2 seconds.

So the flush interval does not mean each write to context you do will be saved 30 seconds later. It means each write you do will be written to disk at most 30 seconds later, but it won't write to disk more than every 30 seconds.

Currently it will write the whole context for that scope to disk each time (ie the entire context for a flow or node). There is an item on the back to improve this by moving to a log-based system where only updates are written to a rolling log - or some other similar method.

It's largely up to which style you prefer. I don't think there is going to be much difference in raw performance unless you're trying to handle 10000s of operations a second.

If I had to guess, I'd say getting the whole object, then addressing the sub property directly then writing the whole object back would be mildly more efficient as you can let native JavaScript do the property lookup in the object, rather than have our code parse your property expression and dig into the object.

Having said that, in a future where we only write updates to disk rather than the whole object, it would be more efficient to call set with just the property you are updating so we only write that, rather than the whole object.

None that I can think of immediately.

1 Like

@knolleary, thanks for the response. It is all much clearer now.
I'll keep an eye out for when the log based approach is rolled out.

Copying from the other chat, to keep it in one place...

@Colin
Just to answer that last question here, yes it is true of context with a file system backing. The data is still accessed from RAM just like the normal context, but it is flushed to disc occasionally, and picked up from disc on startup. However, if one does not call flow.set() then it does not know that the data in the cache has changed so it will not be flushed, so a node-red restart, for example, would lose the recent changes.