Influxdb suggested schema

There have been a number of posts recently about node-RED users using influxdb & grafana to display data, which have whetted my appetite, and am in the process of changing from mySQL to influx (+ grafana).
There is a learning curve with influx, - tags, fields, measurement, etc despite good documentation, it's not easy.

To try and get things right, from the start, what database schemas are being implemented by users for IoT & weather sensor data?
I would hate to construct the flows to provide a schema, which would restrict future development, and have to redo it at a later time.

For me, it falls into at least two different areas, weather data & energy monitoring, with future options to average and combine data, such as average temp, etc.

Any advice appreciated.

Paul,
I am feeding a InfluxDB out node using a change node to change the payload to a JsoNATA expression:

{
"outsideTemp":outsideTemp,
"insideTemp":insideTemp,
"seaPressure":sea,
"backLux":lux,
"outsideTempFront":$globalContext("PortTemp")
}

All the variables are calculated/read from sensor nodes in the flow prior to the change node.

1 Like

Here is the complete flow. I haven't yet incorporated the MQTT Humidity data into the database.

[{"id":"f25ef5c8.43e148","type":"change","z":"9dcf23c2.6c9ca","name":"Set sea","rules":[{"t":"set","p":"sea","pt":"msg","to":"$number($trim(payload)) * 10 * $power((1.0 - (0.0065 * 175.0) / (16.0 + 0.0065 * 175.0 + 273.15)), -5.257)\t","tot":"jsonata"}],"action":"","property":"","from":"","to":"","reg":false,"x":480,"y":140,"wires":[["87415a4a.7c5f18"]]}]

Thanks Gary, as I have a number of sensors, I was using something like this;

msg.payload = [{
    temp: 12,
    humidity: 56,
    battery: 2.97
},
{
    device:"sensor1"
}];
return msg;

...and using the 'device' tag to query 'sensor1' data -
select humidity from "test" where "device" = 'sensor1'

@shrickus, @TotallyInformation how are your schemas structured?

I've not used Influxdb as of late, but I believe that is the recommended way to structure your measurements. All three readings are grouped into a single record, so they all get the same timestamp and are tagged with which sensor the data came from.

If you have a nicely structured topic hierarchy (as Julian always recommends), it becomes almost trivial to use a single jsonata expression to split the topic into pieces, assign each part to a named tag, and build this payoad for sending to influxdb out. Assuming an incoming msg like this:

{
  "topic": "house_main/floor_2/room_master_bath/device_sensor1",
  "payload": {
    "temp": 12,
    "humidity": 56,
    "battery": 2.97
  }
}

you can use this expression:

[
  payload,
  topic.$split("/") {
    $substringBefore("_"): $substringAfter("_")
  }
]

to generate this input array:

[
  {
    "temp": 12,
    "humidity": 56,
    "battery": 2.97
  },
  {
    "house": "main",
    "floor": "2",
    "room": "master_bath",
    "device": "sensor1"
  }
]

et voila!

4 Likes

Here are a few of my queries that may give a hint as to how I use things:

These 2 are creating a mega-humidity chart in Grafana:

SELECT mean("humidity") FROM "autogen"."humidities" WHERE ("location" <> 'Outside' AND "source" <> 'D1M04b') AND $timeFilter GROUP BY time($__interval), "location", "source" fill(null)
SELECT mean("humidity") FROM "autogen"."humidities" WHERE ("location" = 'Outside') AND $timeFilter GROUP BY time($__interval), "source" fill(null)

This creates a temperatures table:

SELECT last("temperature") FROM "temperatures" WHERE $timeFilter GROUP BY "location", "source"

Note that I use both location (room name) and device source as tags. Tags are indexed I think which makes things more efficient. Temperature, Humidity, Light and a few other measurements are all included in the same database so I'm using a combination of measurements and tags.

This example is from my heating system and uses 2 queries, 2 y-axes. In this case, I don't need any tags but I still have multiple measurements on the database:

SELECT mean("Percentage_Demand") FROM "autogen"."heating" WHERE $timeFilter GROUP BY time($__interval) fill(previous)
SELECT mean("Demand") FROM "autogen"."heating" WHERE $timeFilter GROUP BY time($__interval) fill(previous)

You need to use measurements when you might want to process the data, tags are static and used for grouping and/or filtering. With the measurements, you can do things like mean() to calculate the average over the selected time period or something like max()

1 Like

So are you saying that processing something like power data which may need flexible processing, may need splitting from the sensor data array, and should be stored under separate 'measurement' ie;

msg.payload = 999;
msg.measurement = "solar_gen";
return msg;
msg.payload = 3241;
msg.measurement = "home_usage";
return msg;
msg.payload = 500;
msg.measurement = "diverted_power";
return msg;

Whilst things like weather data, would be stored using tags;

var Temp = Math.random()*10;
var Hum = Math.random()*20;
var Batt = Math.random()*10;

msg.payload = [{
    temp: Temp,
    humidity: Hum,
    battery: Batt
},
{
    device:"sensor1",
    location: "garden"
}];
return msg;

Maybe I used the wrong terms - I still struggle! The format above is the one that I use for just about everything. Your tags in that example are "sensor1" and "garden". The measurements are "temp", "humidity" and "battery" so you can process the measurement values.

I think temp, humidity and batt are fields not measurements. Though not at my computer so can't check.

Comparing influx to other database structures, a measurement is sort of like a table of its own, with keys acting as column names and values acting as entries. An important distinction between tags and fields is that tags are indexed while fields are not. Additionally, tag keys and tag values are always stored as strings, which makes them excellent to use for boolean operations but pretty much useless for math operations using influxQL.

1 Like

You may well be right. I'm still at work so I can't check either.

Exactly what he said! :wink:

That's the problem with InfluxDB. The terminology is close to SQL but not quite the same.

1 Like

However you format it (there's wiggle room) I recommend identifying individual devices as tags and using the same measurement between devices when coming up with your schema. This makes it so you can use variables and wildcards in stuff like grafana to visualize your data without having to build out each possible instance.

Example: in grafana I wanted to visualize CPU utilization on multiple systems that were populating influxDB via telegraf. The query I used was SELECT mean("Percent_Processor_Time") FROM "win_cpu" WHERE ("host" =~ /^$host$/ AND "objectname" = 'Processor' AND "instance" != '_Total') AND $timeFilter GROUP BY time(15s), "instance" fill(null). The measurement in question is "win_cpu." I have multiple machines all writing to "win_cpu" with the "host" tag populated by their hostname, which can then become a filter. This allows all relevant bits of data (in this case percentage busy cycles per logical processor) to be displayed together with a single query rather than needing to define each and every possible logical processor (which suuuuuuuuucks to do on ESX hosts.)

1 Like

Not totally relevant to the topic at hand, but have you played with the discrete panel plugin or annotations for your on/off measurement there, TI?

Caused me a lot of hair pulling (I have very little anyway) when I started using InfluxDB / Grafana, after years of writing SQL queries.

1 Like

Thanks for the comments and advice guys!

Yes. I love discrete for showing on/off information like lights, heating, etc. I also have a digital sound input on my sensor platform in the hallway and that works nicely too.

I occasionally use annotations as well. They are now a LOT easier to use with the latest versions of Grafana.

Another feature I've used a little is variables and being able to generate multiple charts from a single definition. Great for things like gauges for a series of sensors (like temperature or humidity per room).

Do you use downsampling?

On one sensor, I get data every 5 seconds, so that's over 17,000 entries per day, which takes a lot of loading!
I've read the section on downsampling, but it generally means having 2 charts - one with all of the datapoints (for most recent 1 hour), and another which contains the downsampled data (but not the most recent hour).
It would be great if both could be displayed in the same chart.

We're getting more out of scope for node-RED by the moment, but why would you need separate charts? Using a continuous query that re-writes the returned entries with a longer retention policy should allow the same queries and therefore the same charts to use the culled data as the un-culled data shouldn't it? I'm going to have to play with this some.

1 Like

for downsampling you usually use another db with a longer retention policy - so you cant use the same query.

@Paul-Reed: i would reduce the sampling rate of such 5 sec sensors (rbe node) and extend the retention policy for the detailed db to a few days (depending on your needs) - to lower hdd load, you can run the detailed db in a ramdisk.