Hello experts
I'm using node-red to fetch sensor data from an MQTT broker. I modify the data and send it again to the broker, to the global context and some of it to influxdb.
I use this data model on the broker:
msg.topic: data/'place'/'value-type'
msg.payload: 'value'
The sensor data comes from various sensors/data-sources and is then reworked with node-red according the above data model.
To send data to influxdb, I take the above message and rearrange it according to a beginners guide from the net:
msg.measurement : 'place'
msg.payload: {'value-type' : 'value'}
This means that I create an entry in influxdb for each data point. I heard this is not optimal. Are there any hints on how I can improve this, if it makes sense at all?
group messages for the same 'place' / msg.measurement
use another data model for influxdb with tags (I have absolutely no clue what tags are, btw)
Generally data which are collected in one operation (and so have the same timestamp) should be written to a single measurement. If data are collected in multiple operations (and so have different timestamps) then they should go to separate measurements.
So, for example, if you have a temperature and humidity sensor that gives you both in one operation then put them both in one measurement, otherwise put them in different measurements.
I see what you mean. I will have to think about it. I would have to recombine the messages after I have split and normalised the data. The msg.parts fields are still there.
On the other hand it may be of no consequence which way you do it. How many values are there and how often are you writing them? If you did merge them where appropriate how many measurements would there then be?
Many thanks for taking my hand and leading me forward.
In the current setup there are about 40-60 messages per minute going into the influxdb. I placed a speed node in front of the DB to get a feel about it:
All messages are a single value, value-type, place triplet. If a data source sends several values in one batch, I split them up, reform them into the standardised value-types and places.
Main learning was about what to put into tags.
So far I did not use those, but used measurement instead.
Anyway, now I have some material to rethink my data schema.
Here is a table and an explanation on what measurement data flows through my MQTT broker. This data is collected from various sensors and data sources, then normalised and then sent to the broker.
Data description / Remarks
Place
Value-Type
Unit
Data type
Example
power usage of the household in W
power
now
W
Int
250
total power usage of the household in kWh up to now
power
today
kWh
Int
1687
current solar power production
solar
now
W
Int
132
total solar power production up to now today
solar
today
KwH
Int
5634
currently exported power into the grid
grid
exportNow
W
Int
0
currently imported power from the grid
grid
importNow
W
Int
118
temperature in the home office
office
Temp
℃
Int
14
humidity in the home office
office
Hum
%
Int
61
relay status of the sonoffth10 device
office
Relay1
on/off
String
off
temperature in the storage room
storage
Temp
℃
Int
14
humidity in the storage room
storage
Hum
%
Int
61
relay status of the sonoffth10 device
storage
Relay1
on/off
String
off
temperature in the cellar
cellar
Temp
℃
Int
15
humidity in the cellar
cellar
Hum
%
Int
98
relay status of the sonoffth10 device
cellar
Relay1
on/off
String
off
temperature in the living room 2nd floor
living
Temp
℃
Int
21
humidity in the kiving room 2nd floor
living
Hum
%
Int
62
relay status of the sonoffth10 device
living
Relay1
on/off
String
off
temperature in the parents sleeping room
parents
Temp
℃
Int
18
humidity in the partens sleeping room
parents
Hum
%
Int
64
relay status of the sonoffth10 device
parents
Relay1
on/off
String
off
temperature in the woodshep ( equals outside temp)
woodshed
Temp
℃
Int
12
humidity in the woodshep ( equals outside humidity)
woodshed
Hum
%
Int
85
relay status of the sonoffth10 device
woodshed
Relay1
on/off
String
off
open if the wardrobe doors are open
wardrobe
Doors
open/closed
String
closed
on if the wardrobe lights are on
wardrobe
lights
on/off
String
off
1 if the wardrobe lights are on
wardrobe
dlights
1/0
Int
0
Distance from the top of the tank to the water level
Not everything which is sent through the MQTT broker needs to go into the influxdb. I only send data where I want to know the history, or the development over time.
All current values go into global context, so for controlling actors, I fetch data from there.
I never need to fetch data from influxdb into node-red. Only grafana uses data from influxdb.
There has been a discussion about my way of doing it which creates holes in the database. I could aggregate messages with the same measurement and the same tags.
Here is a proposed idea on how I could use measurement/tag/field of influxdb to structure the data. I'm very interested in your comments to learn, how I can improve that.
proposed data format for influxdb
msg for node-red-contrib-influxdb
measurement
tag
field
msg.measurement
msg.payload
Energy
source = power
now
Energy
[ { source = power } , { now = 250 } ]
Energy
source = solar
now
Energy
[ { source = solar } , { now = 132 } ]
Energy
source = grid
exportNow
Energy
[ { source = grid } , { exportNow = 0 } ]
Energy
source = grid
importNow
Energy
[ { source = grid } , { importNow = 118 } ]
Rooms
source = office
Temp
Rooms
[ { source = office } , { Temp = 14 } ]
Rooms
source = office
Hum
Rooms
[ { source = office } , { Hum = 61 } ]
Rooms
source = storage
Temp
Rooms
[ { source = storage } , { Temp = 14 } ]
Rooms
source = storage
Hum
Rooms
[ { source = storage } , { Hum = 61 } ]
Rooms
source = cellar
Temp
Rooms
[ { source = cellar } , { Temp = 15 } ]
Rooms
source = cellar
Hum
Rooms
[ { source = cellar } , { Hum = 98 } ]
Rooms
source = living
Temp
Rooms
[ { source = living } , { Temp = 21 } ]
Rooms
source = living
Hum
Rooms
[ { source = living } , { Hum = 62 } ]
Rooms
source = parents
Temp
Rooms
[ { source = parents } , { Temp = 18 } ]
Rooms
source = parents
Hum
Rooms
[ { source = parents } , { Hum = 64 } ]
Rooms
source = woodshed
Temp
Rooms
[ { source = woodshed } , { Temp = 12 } ]
Rooms
source = woodshed
Hum
Rooms
[ { source = woodshed } , { Hum = 85 } ]
Water
source = tank
Dist
Water
[ { source = tank } , { Dist = 124 } ]
Water
source = tank
Volume
Water
[ { source = tank } , { Volume = 27.4 } ]
Water
source = tank
dpump
Water
[ { source = tank } , { dpump = 0 } ]
Energy
source = freezer2
Power
Energy
[ { source = freezer2 } , { Power = 0.3 } ]
Systems
source = db1
root
Systems
[ { source = db1 } , { root = 45789 } ]
Systems
source = db1
root-usage
Systems
[ { source = db1 } , { root-usage = 3.5 } ]
Systems
source = db1
boot
Systems
[ { source = db1 } , { boot = 65784 } ]
Systems
source = db1
boot-usage
Systems
[ { source = db1 } , { boot-usage = 5.1 } ]
Systems
source = db1
run
Systems
[ { source = db1 } , { run = 2345675 } ]
Systems
source = db1
run-usage
Systems
[ { source = db1 } , { run-usage = 15.2 } ]
Systems
source = db1
load1
Systems
[ { source = db1 } , { load1 = 0.7 } ]
Systems
source = db1
load5
Systems
[ { source = db1 } , { load5 = 0.3 } ]
Systems
source = db1
load15
Systems
[ { source = db1 } , { load15 = 0.1 } ]
Systems
source = db1
memoryTotal
Systems
[ { source = db1 } , { memoryTotal = 3967456 } ]
Systems
source = db1
memoryFree
Systems
[ { source = db1 } , { memoryFree = 1324546 } ]
Systems
source = db1
memoryUsage
Systems
[ { source = db1 } , { memoryUsage = 66 } ]
(and yes, it is { "source" = "db1"} etc, but this forum has eaten up all my "-characters.
I do have to bring the incoming data into shape, add some context, before I drop it into influxdb. But I know how to do this. So yes, Colin, I promise, I will do it, really, honestly.
But there is another topic I need to figure out first:
I understood that influx-measurement is really means what it says: one measurement set of data from one sensor.
I understood too that fields are used to describe values of a measurement of a sensor.
What I'm not sure at all is on how to use the influx tags.
In my previous message I showed one way I figured I could do it. I'm really interested in the opinions of those who have done this.
Not necessarily one sensor, but a set of values that are available at the same instant.
A field represents a value of some sort, the field name tells you what it is. Temperature, humidity etc are typical field names.
Tags are different from fields in that they are not process values, but are strings (usually) that identify something about the set of values in that measurement. A typical use is to identify location. So if you were measuring temperature and humidity in a set of rooms then the fields might be temperature and humidity, a tag name might be 'room' and the tag values would be kitchen, bedroom etc. So records in the database would be something like
time temperature humidity room
----- ------------ ---------- --------
.... 20.5 65 "kitchen"
.... 25.3 68 "bedroom"
In the query to access the database one can then say things like select * from the_measurement where room="kitchen"
to get all the kitchen values.
In grafana one can even have automatically repeating panels where you just have to define the panel for one room and then it will automatically show the panels for all the other rooms.
Tags are indexed in the database which means one can select on tag values very efficiently.
Note that all the records in the measurement have the same fields, if the fields are different then they should be in different measurements. That means your table above has some problems. For example you seem to have three different fields in the Energy measurement, now, exportNow and importNow. However, if those are all energy measurements then perhaps you just need a field called value and a tag called source. Alternatively if all those values are read at the same time then put it in three fields in one measurement, with field names power, solar and grid, for example, with no need for tags at all.
Many thanks. This is exactly what I was looking for.
I will use this advise to redraw the data schema. This is the part where my head hurts the most.
And after that there will be time for drawing wires with node-red. This is the fun part.
Hi @Colin, sorry to enter this topic at such a late stage. I am asking you directly because you appear to have a better handle on infuxdb schema than most. I have spent a couple of days reviewing lots of posts re. schema and find myself a little confused over table (measurement) organisation and wondered if you or any one else can clarify the subject of schema design a little more. I have been using Influxdb + telegraf + grafana for a while but would like to get it right this time round.
I am trying to redesign from scratch a suitable influxdb schema for DIY home IoT implementation.
Given a list of all types of measurement (temp, humidity, battery, light level, light colour, light status, occupancy, barometric pressure, water level, device_id, location, hostname, CPU_load, memory etc. etc.). What would be the ideal schema for a performant and flexible layout ?
The sort of things I am not sure of are:
How many tables (measurements) ?
Which items should be tags ?
Which should be fields ?
Also it would be interesting/instructive to see the data structure used to create the default system monitoring dashboard that is automatically generated by grafana.
P.S. I tried to stay awake through all of the influxdb video [Training] InfluxDB Performance Tuning and Schema Design - YouTube but it lacked enough real IoT schema examples for me to make any more progress. One thing they say is design your queries first which is OK if you know in advance all the queries you may need. Unfortunately I cannot envisage all of these up front.
I think all or those are covered in the post you replied to, and in previous posts in the thread.
Please read my posts again carefully, then if necessary come back and ask for clarification.
Note that I am not an expert in influx, this is just my take on it.
Hi @Colin thank you for your response. I found more replies from yourself and just assumed that you were an expert on this.
I have read many posts including this one but still have difficulty understanding when to create multiple measurements for example.
Also some where a point was raised against putting the device_id as a tag because the time series is more about continuous data analysis and should not be interrupted by a change in device_id when the most important thing is the temperature at that location rather than which device originated it.
Where are the lines drawn between which items should have their own measurement and which items should share the same measurement ?
Clearly one would not be measuring barometric pressure in every room but care should be taken not to have a table full of null values.
Can you offer any further help or insight other that go back and read it all again ?
If you mean the id of the device measuring the temperature, then it depends what you want to measure. If you want to measure the temperature of a device wherever you put it (a portable device maybe) then use a tag defining the device. If you want to measure the temperature of a room then use the room name as the tag, then if the device breaks and you replace it you will still be measuring the bedroom (or wherever) temperature
As said in earlier posts I generally decide this based on whether the data are available at the same instant. So if you have a sensor measuring room humidity and another measuring room temperature then put them in separate Measurements, but if you have one sensor measuring measuring both then put them in one measurement. However, if you only wanted to store humidity every 10 mins but temperature every 1 minute then they would obviously go in separate measurements.
Exactly. That should be in a separate measurement.
I recommend starting by working out all the fields and tags, then define the Measurements.
Can I clarify that when you use the term 'measurement' you are referring to a separate influxdb table (as referenced by the FROM field in a grafana query ?
And if so does this mean that a typical home IoT system would likely have lots of measurements like 'humidity', 'power', 'barometric pressure', 'host_info', 'lighting', 'heating' etc. as a direct preference to having ANY null values ?
Finally (I hope), which is the preferred choice:
Create a separate 'measurement' for humidity and temperature (because the sampling periods are different) or
Align the sampling periods and use the same measurement or
Cache all data in global/flow and write to influx in one go (assuming more rapidly sampled data is just for the ui and that these intermediate values can be lost).
Is all of this a kind of db schema normalisation as found in conventional 3rd normal form relational databases ?
I mean Influx Measurements. Influx doesn't have tables, but Measurements are similar to tables in, for example, mysql.
For 1, 2 or 3 it is up to you. Option 2 is just a subset of Option 3. Caching will make db access a little more efficient, at the expense of complexity (and hence bug likelihood) in node-red. Personally I would generally go for the simplest, which is option 1.
Remember that you cannot cache across tags, so if you have a tag for room, then you would have to cache the data separately for each room. Caching can get complex and I don't like complexity where it is not required.
If you are running on any sensibly powered machine then home automation is a trivial task for influx. It is designed for handling vast amounts of data at great speed. There is something in the docs somewhere that talks about the power of machine you need for various tasks, and a home automation system is way off the bottom of the lowest task that they talk about. The result is that it doesn't really matter how you organise the data, it won't make any difference to performance.
Don't try and run it with an SD card as the storage though.