There have been a number of posts recently about node-RED users using influxdb & grafana to display data, which have whetted my appetite, and am in the process of changing from mySQL to influx (+ grafana).
There is a learning curve with influx, - tags, fields, measurement, etc despite good documentation, it's not easy.
To try and get things right, from the start, what database schemas are being implemented by users for IoT & weather sensor data?
I would hate to construct the flows to provide a schema, which would restrict future development, and have to redo it at a later time.
For me, it falls into at least two different areas, weather data & energy monitoring, with future options to average and combine data, such as average temp, etc.
I've not used Influxdb as of late, but I believe that is the recommended way to structure your measurements. All three readings are grouped into a single record, so they all get the same timestamp and are tagged with which sensor the data came from.
If you have a nicely structured topic hierarchy (as Julian always recommends), it becomes almost trivial to use a single jsonata expression to split the topic into pieces, assign each part to a named tag, and build this payoad for sending to influxdb out. Assuming an incoming msg like this:
Here are a few of my queries that may give a hint as to how I use things:
These 2 are creating a mega-humidity chart in Grafana:
SELECT mean("humidity") FROM "autogen"."humidities" WHERE ("location" <> 'Outside' AND "source" <> 'D1M04b') AND $timeFilter GROUP BY time($__interval), "location", "source" fill(null)
SELECT mean("humidity") FROM "autogen"."humidities" WHERE ("location" = 'Outside') AND $timeFilter GROUP BY time($__interval), "source" fill(null)
SELECT last("temperature") FROM "temperatures" WHERE $timeFilter GROUP BY "location", "source"
Note that I use both location (room name) and device source as tags. Tags are indexed I think which makes things more efficient. Temperature, Humidity, Light and a few other measurements are all included in the same database so I'm using a combination of measurements and tags.
This example is from my heating system and uses 2 queries, 2 y-axes. In this case, I don't need any tags but I still have multiple measurements on the database:
SELECT mean("Percentage_Demand") FROM "autogen"."heating" WHERE $timeFilter GROUP BY time($__interval) fill(previous)
SELECT mean("Demand") FROM "autogen"."heating" WHERE $timeFilter GROUP BY time($__interval) fill(previous)
You need to use measurements when you might want to process the data, tags are static and used for grouping and/or filtering. With the measurements, you can do things like mean() to calculate the average over the selected time period or something like max()
So are you saying that processing something like power data which may need flexible processing, may need splitting from the sensor data array, and should be stored under separate 'measurement' ie;
Maybe I used the wrong terms - I still struggle! The format above is the one that I use for just about everything. Your tags in that example are "sensor1" and "garden". The measurements are "temp", "humidity" and "battery" so you can process the measurement values.
Comparing influx to other database structures, a measurement is sort of like a table of its own, with keys acting as column names and values acting as entries. An important distinction between tags and fields is that tags are indexed while fields are not. Additionally, tag keys and tag values are always stored as strings, which makes them excellent to use for boolean operations but pretty much useless for math operations using influxQL.
However you format it (there's wiggle room) I recommend identifying individual devices as tags and using the same measurement between devices when coming up with your schema. This makes it so you can use variables and wildcards in stuff like grafana to visualize your data without having to build out each possible instance.
Example: in grafana I wanted to visualize CPU utilization on multiple systems that were populating influxDB via telegraf. The query I used was SELECT mean("Percent_Processor_Time") FROM "win_cpu" WHERE ("host" =~ /^$host$/ AND "objectname" = 'Processor' AND "instance" != '_Total') AND $timeFilter GROUP BY time(15s), "instance" fill(null). The measurement in question is "win_cpu." I have multiple machines all writing to "win_cpu" with the "host" tag populated by their hostname, which can then become a filter. This allows all relevant bits of data (in this case percentage busy cycles per logical processor) to be displayed together with a single query rather than needing to define each and every possible logical processor (which suuuuuuuuucks to do on ESX hosts.)
Yes. I love discrete for showing on/off information like lights, heating, etc. I also have a digital sound input on my sensor platform in the hallway and that works nicely too.
I occasionally use annotations as well. They are now a LOT easier to use with the latest versions of Grafana.
Another feature I've used a little is variables and being able to generate multiple charts from a single definition. Great for things like gauges for a series of sensors (like temperature or humidity per room).
On one sensor, I get data every 5 seconds, so that's over 17,000 entries per day, which takes a lot of loading!
I've read the section on downsampling, but it generally means having 2 charts - one with all of the datapoints (for most recent 1 hour), and another which contains the downsampled data (but not the most recent hour).
It would be great if both could be displayed in the same chart.
We're getting more out of scope for node-RED by the moment, but why would you need separate charts? Using a continuous query that re-writes the returned entries with a longer retention policy should allow the same queries and therefore the same charts to use the culled data as the un-culled data shouldn't it? I'm going to have to play with this some.
for downsampling you usually use another db with a longer retention policy - so you cant use the same query.
@Paul-Reed: i would reduce the sampling rate of such 5 sec sensors (rbe node) and extend the retention policy for the detailed db to a few days (depending on your needs) - to lower hdd load, you can run the detailed db in a ramdisk.