SQLITE data to Charts - Various Issues

Hello

This is my first post, sorry it's a bit long-winded ...

Overview

I have an outstation sending data to Node-Red via MQTT. The data is sent every 30s and is (temperature 1, temperature 2, humidity, pressure, voltage 1, voltage 2, voltage 3, voltage 4).

I want to chart this data in multiple charts from periods of 8 days to 30 minutes and I want to have this data re-load following server restarts so the incomming data needs to be stored on the server.

I have Node-Red, Mosquitto and SQLITE running on a PI.

Initial Development

My initial approach was to send the incoming data to a chart and to a file.
On start-up, the data from the file is loaded into the chart and then incoming data further populates the chart.

This approach has the merit of simplicity and works well for a few days.
I have a file per chart data stream, but these files obviously just grow and grow. As the files get bigger, Node-Red slows down and further development with this approach seems inelegant.

Database Development

I decided to try to send the data to a SQLITE database and chart it from there. This is a work-in-progress and I've got this partially working with an inject node manually to create a chart.

I am writing the incoming data to a database table together with unix time and also human readable date and time.

Problems

(a) Manual inject of the query to select pressure data results in a graph with missing points as can be seen from the straight line on the chart (following the highlighted data point). This is despite the underlying data being present in the database and this result is reproducable.

My select node is simply
"SELECT time as "x", PRESSURE as "y" FROM sensors WHERE currenttime >= date('now','-60 minutes')"

and the change node is
[
{
"series": ["pressure"],
"data": [[payload[[0..64000]]]],
"labels": [[""]]
}
]

Now, I know I should not have 64000 in there, but I wanted to ensure I get all the records from the query, and if I make that a low number some could be missed. I would really like to set all my queries at 8 days and then where I want less time displayed, define this on the chart node. [I envisage later on perhaps having buttons to allow me to choose the duration displayed].

What's the elegant approach to this and how do I ensure that all data points are written to the chart?

(b) The chart doesn't persist once created. I think what's happening is that the insert function generates an output from the database node which clears the chart. I want multiple persistent charts with different timescales and am not sure what I'm missing, but it must be obvious!

(c) I want to generate charts that update every time there is new data.
I suspect that it's really inefficient to run a query every 30s and fully regenerate all graphs plus visually this is not attractive to watch.

I thought that an approach could be to populate the chart historic content at start-up and then simply feed real-time data to the chart but this doesn't seem to work the way I've tried in the flow.

Again, I'm looking for how best to do this; I don't really like the idea of constantly re-generating charts and would prefer just to send new data to the chart in the same way that this works using local files.

Finally, I remain uncertain about the merits of JSONata vs a function node approach for setting up the chart data.

Any pointers would be much appreciated!

My personal take on this is - if you are processing anything more than a few hundred rows of data go with a function node. It is MUCH faster.

Look at the screenshots in this demo flow where processing 1000 elements in an array js took 4ms & JSONata took 1000+ ms

1 Like

So just don't set a number there at all, and it will use the entire payload array.

True -- you should not do that.

This approach does work, but you need to feed the realtime data points to the chart node using a different format. I don't remember what that format should be, but if you send an array of 1 point, it replaces the historic data array. I believe the incoming point needs to just be an object.

As Steve mentioned, a JS function will run faster than a JSONata expression -- but that should not be a big factor for a couple hundred points, so I generally prefer JSONata. Keep in mind that your chart resolution is probably going to be no more than 1000px wide, so feeding it more data points than that is just a waste of resources.

I've seen that before, and it's usually because your x-axis timestamps are not sequential. Try adding an order by currenttime to your query. Alternately, you can do the sorting in your function code or JSONata expression:

[
  {
    "series": ["pressure"],
    "data": [[payload^(x)]],
    "labels": [[""]]
  }
]

although the database sorting will always be more efficient.

1 Like

Thankyou Steve and shrickus

I have done some more work on this and got a bit further forward.


I have now put all the data in one table for simplicity and to save duplicate timestamps across tables.

(a) Resolved this one which was down to date / time discrepancies arising from the way I was storing and querying data. I have now stored data using a timestamp of SQLITE datetime() rather than passing a value from Node-Red to store.

I'm not that worried about the data timestamps being wholly accurate although if I find that the data is taking a long time from the outstation to the database, I may need to add a source data timestamp. But I'm assuming that this will be within 10s or so and won't really matter.

(b) and (c) I have worked-around chart persistence by restricting what can pass from the database to the chart with a switch node. This seems really inelegant but I'm still not sure how to stop other queries from the database disrupting the chart nodes. With this approach, each chart will need a custom blocker so there must be a better way to do this!

The database query populates the chart on Node-Red startup after which the real-time data builds on this, so that the queries only need to run once per chart to get the seed data.

Although I was previously feeding the real-time data to the chart node, it was clearly being overwritten. The blue data points were from the database and the white points real-time so this makes it obvious that there was a re-start too. I am simply stopping and starting the Node-Red service to test this.

JSONata vs JS function.

The database will have up to c.23k rows (limited by a delete job running every 8 hours). With a query to seed each chart, JSONata could consume quite a lot of processing compared with a JS function.

I'm thinking of approaching this as follows;
At start-up ...

  • get the seed data for all the charts in a single query
  • process this in a JS function to format for charts
  • send the data to the chart nodes
  • block chart nodes from getting any further data from queries

Does this sound sensible? Any pointers to how best to approach a JS function for this?

thanks!

1 Like

I don't see why you would need to do this -- you should have 1 flow and sql node for inserting into the database, and another flow and sql node that only queries for historical data and populates the charts when first displayed. Since both sql nodes will reference the same configuration node, node-red should not be creating multiple connections, but reusing the existing connection. This way, you don't need any custom "blocker" flows/code.

As for processing the data: your screen shot is only 741px wide, so it seems like you can at most display ~700 different data points -- nothing you can write in JS or JSONata will be nearly as performant as aggregating your data right in the query will be. I wouldn't worry about the post processing nearly as much as getting the query to produce < 1000 data points. Any more than that, and the dashboard will start slowing to a crawl.

How to aggregate the data is another question, with too many answers for me to know which works best for your data. But the general idea is to group by time slots (e,g 1 measurement per hour, 15 min, 5 min, day, whatever). Then for each timestamp, return a single value that is the avg of all values during that time slot.

Another hybrid approach would be to return all rows from the database, and use a node like node-red-contrib-aggregator to "compress" the results into a more manageable set.

But in your case, the most performant option would probably be to use a time-series database (i.e. influxdb) which does all of that aggregation internally. It uses Continuous Queries to "downsample" the data, just like you are trying to do, but without you having to manage it in code or your query. I just bring it up as another possible solution, although it would require setting up and learning another database, so I'm sure you don't want to make that switch right now.

I would start with avg value queries grouped by the hour, just to get familiar, and then you can adjust your query window/timeslots as necessary. If you need help with writing that sql, post some actual data and we can have that discussion next.
__
Steve Rickus

BTW, I love this approach, since it clearly shows the historical data vs. the "live" data... I will be stealing that for future projects!

Could you post an example of that "live" data msg.payload as a reference?

There's no need for blocking nodes ...

Of course I should! That's so obvious that I missed the concept while head down in the detail. What I had is one sqlite node with the insert and queries connected to it. I have now split this out as below. This represents a great simplification with the output from the mqtt in nodes being combined in the join node while the function node constructs the sql insert statement.

image

Too much data for the charts

I've not looked at influxdb, but will add that to my reading list. When I started this, I thought that this would work well with a RRD (round robin database) and it does appear that influxdb could be used in this manner.

It's true that I need to do something about reducing the data volume. At the moment that's a 'backlog' item and I'm focussing on trying to populate the charts at start-up. For the development I can restrict the data quantity by limiting the time range selected from the database.

I have now got multiple queries running at start up each with this flow structure and therefore avoiding the needed for switch node blocking.

The 'system flow' input at the top is simply the real-time data feed.

A single startup query and function

I've defined the query to get sample data from the database to seed the charts, but am not sure how to process the results from this.

SELECT unixtime, temperature, humidity, pressure, volts0, volts1, volts2, volts3 FROM rtu1 WHERE myDate > datetime('now','-3 hours');

Example query result (directly from sqlite) is
1641745177724|16.687|56.8|996.72|3.259|4.275|16.076|10.373
1641745207282|16.687|56.9|996.75|3.259|9.188|16.999|2.449
1641745236843|16.687|56.9|996.78|3.258|17.903|3.874|23.816

At the output of the sqlite node the object looks like this...

Presumably I need a function node with multiple outputs each feeding to a chart node?

I've also been struggling to work out how to get a chart with two lines and want a chart with two 'y' values - volts1 and volts2 vs time.

For completeness, here's the output from the chart seed query ..

image

Here's one of the real-time data inputs to the chart ...

image

It's the change of topic that causes the colour change to the charted data, depending on the Series Colour settings defined in the chart node.

shrickus, was that what you were asking?

Sending Database to Charts

I found a reply on this discussion Dynamic series identification from message, send data to Chart by Steve-Mcl providing a great solution in the form of a "DB data to chart" function.
I slightly modified this function and still using one function node per data series this works really well. Of course there is a certain clunkiness to running eight queries at startup ...

It would be interesting to know if this function could easily be modified to provide each data series on a separate function node output, or of some other approach to separating the data output to send to different charts.

Reducing the amount of data sent to the charts ...

One advantage of having the eight queries is that it makes it simple to implement some rudimentary data thinning in each query. My initial approach is to use the condition Where ROWID % 20 = 0; in the sql statements which has the effect of selecting every 20th row for the seed data. I just need to tweak so see how many rows I can get off with skipping.

Of course it could. You can set a function to output multiple outputs.

However, instead of modifying it or copying the function, you could use (reuse) it with a link call (kinda like a subroutine)

Do you know how to use link call?

1 Like

Steve-Mcl
Thanks for this. I'd not looked at using the link call node, but have now tried it.
I did need to watch some of the development video to understand this one!

Is this the correct approach to using a link call node?
It seems quite similar to the use of a sub-flow but with less effort...

image

So what happened to this "single startup query"?

I would use one inject node to get all of this data (but add an order by unixtime clause), then use different change nodes to restructure just the info necessary for each chart.

For instance, to show all the voltages in one chart, use something like this expression (not very tested):

(
    $sensors := ["volts0", "volts1", "volts2", "volts3"];

    payload@$data {
        "series": [$sensors],
        "data": [
            $sensors@$series.[
                $data.[{
                    "x": unixtime,
                    "y":$lookup($, $series)
                }][]
            ]
        ]
    }
)

This uses an array of sensor names to build an array of arrays of x,y points. You could even parameterize a link call flow to pass in the list of sensor field names, and get away with only 1 change node -- good luck!

I've now got all the charts functioning using multiple queries and link call nodes.
This brings me to where I was when I added sqlite to this, but now the data is much easier to manipulate and I can keep the database size manageable.
With this essentially working, I'm now looking again at the single-query solution.

Yep, I missed off the "order by unixtime" from that query.

Thanks for the suggestion about using change nodes to arrange the data. I've been looking into how best to accomplish this - whether to try the split in a function node, a switch node or change node.

Many thanks for the expression - I'll have a look and that and see if I can get anywhere with that in a change node.