for my project I need to be able to go back in time and access a small set of previously collected senor data values.
The set is quite small, about 10 - 25 values,
The values are collected over a period of the last 10 days from now.
The important value for my event/flow is the minimum in the set, which also needs to be persited/survive a restart of the flow/NodeRED.
What I have in mind is to collect the values in a rolling buffer and - when needed - access the bufffered values, calculate the min-value and store it/persist it then as well.
Would you recommend to deploy a database based node or store this in an array or key-value pairs to a local file or what else?
....that sounds perfect, thank you both for the fast response. @ukmoose, you are right, thought about that, too...but I see the need to access the historic data as part of my use case. The min-value alone might not be sufficient and some re-calculations need to be done from the historical data. @dceejay, thank you very much for providing that link.
How often are you getting updated values? That makes a massive difference.
25 x 10 x 17280 = 4,320,000 entries! (25 sensors, 10 days, updates every 5 seconds)
25 x 10 x 1440 = 360,000 entries (updates every minute)
That drops to 144k entries if the sensors update every 5 minutes.
If each sensor reading is a simple long integer then you might be able to get away with storing that in memory with push to disk every few minutes.
But really, InfluxDB would greatly simplify things in the long run. Bit of a pain to set up initially but it will automatically trim the data for you and you can even summarise the data for long-term storage and history.
For myself, my detailed data is kept for 7 days at 1 minute intervals (or thereabouts), it is then summarised for long-term storage as hourly max/min/avg which is kept for several years. InfluxDB does all of this itself via a simple configuration of the databases.
This has the further advantage of giving access to really good dashboards via Grafana.
Just using a File node is super simple and easy to set up.
I run many, using them as backups (to Influx) of the raw data that come from devices. Using a Function node you can customise the output for easy import into Excel or whatever. All the data is collected on an ssd which hangs directly off the NR RPi. Never missed a beat.
Thanks for the info, @TotallyInformation ...very good calculation of the amount of data which sensors could produce.
However my use case is a bit different.
There will be only one, maybe two (temperature-) sensors, pushing data every minute.
What I need to actually collect and access is just the minimum values, reached over the last 5 days, which are approx. 10-25 values in total (per sensor).
A minimum value is defined as the temperature that has been previously collected before another, newer-in-time value is higher than this value (including safety distance/hysteresis).
As @ukmoose already pointed out, the absolute minimum in that array can be stored as a single value. However, I am not sure if I need to change using the line of best fit, based on the values instead for my use case to give better results.
Hence the need to store and access the "complete" array.
My initial question was fuelled by the assumption, that employing a database node is something of an overkill for storing this low amount of data.
Your sample calculation actually confirmed what I was thinking. Should I need to store a substantial amount if data values, a database node is definitely in order. For my 10-25 values, I think I'll try a file/context based solution.
Again, thank you for your insights on this topic.
Regards,
Hominidae
If you have the time, you should still try using InfluxDB and Grafana since InfluxDB will do the calculations for you I think. It has excellent ability to work out things like time-window minimums.
thanks again, @TotallyInformation ....what I do not want is to spread the logic of the algorithm accros different components.
What I currently have in mind is a set of different flows, where only one flow will be used/responsible for calculating/managing the main trigger/control event. Doing other calculations with respect to this task in another flow or even another component is not what I would do light hearted.
@Paul-Reed: thanks for sharing your opinion on this. Actually I know about the nice graphing features of influx in conjunction with grafana. I might consider that as an extra/bonus feature but not as a requirement ATM. My setup currently has all that is needed by employing Blynk and its SuperChart Widget in the frontend.