Data entry into influxdb

Urs-Eppenberger · 10 October 2021 15:24

Hello experts
I'm using node-red to fetch sensor data from an MQTT broker. I modify the data and send it again to the broker, to the global context and some of it to influxdb.
I use this data model on the broker:
msg.topic: data/'place'/'value-type'
msg.payload: 'value'

'place' : cellar, woodshed, office, printer1, ...
'value-type' : Temp, Hum, Light, Door, Yellow, ...

The sensor data comes from various sensors/data-sources and is then reworked with node-red according the above data model.

To send data to influxdb, I take the above message and rearrange it according to a beginners guide from the net:
msg.measurement : 'place'
msg.payload: {'value-type' : 'value'}

This means that I create an entry in influxdb for each data point. I heard this is not optimal. Are there any hints on how I can improve this, if it makes sense at all?

group messages for the same 'place' / msg.measurement
use another data model for influxdb with tags (I have absolutely no clue what tags are, btw)
other hints?

Kind regards,
Urs.

Colin · 10 October 2021 15:53

Generally data which are collected in one operation (and so have the same timestamp) should be written to a single measurement. If data are collected in multiple operations (and so have different timestamps) then they should go to separate measurements.

So, for example, if you have a temperature and humidity sensor that gives you both in one operation then put them both in one measurement, otherwise put them in different measurements.

Urs-Eppenberger · 12 October 2021 06:32

I see what you mean. I will have to think about it. I would have to recombine the messages after I have split and normalised the data. The msg.parts fields are still there.

Colin · 12 October 2021 08:20

It might be better, and almost certainly simpler, to send it straight to influx at the point of measurement input rather than after splitting it up.

Colin · 12 October 2021 13:40

On the other hand it may be of no consequence which way you do it. How many values are there and how often are you writing them? If you did merge them where appropriate how many measurements would there then be?

Urs-Eppenberger · 12 October 2021 14:15

Many thanks for taking my hand and leading me forward.

In the current setup there are about 40-60 messages per minute going into the influxdb. I placed a speed node in front of the DB to get a feel about it:

All messages are a single value, value-type, place triplet. If a data source sends several values in one batch, I split them up, reform them into the standardised value-types and places.

Colin · 12 October 2021 14:25

How many different process variables (measurements) is that?

What hardware are you running on. If it is a pi then where is the database (SD card USB SSD, etc)?

Do you mean value, field name and measurement name?

Edit: or possibly value, tag value and measurement name, or even value, tag value, and another tag value?

Urs-Eppenberger · 13 October 2021 13:08

I watched the webinar from influxdb about Schema Design for IoT
Should have done that before starting to send data into my influxdb.

Main learning was about what to put into tags.
So far I did not use those, but used measurement instead.
Anyway, now I have some material to rethink my data schema.

Urs-Eppenberger · 15 October 2021 20:29

Here is a table and an explanation on what measurement data flows through my MQTT broker. This data is collected from various sensors and data sources, then normalised and then sent to the broker.

Data description / Remarks	Place	Value-Type	Unit	Data type	Example
power usage of the household in W	power	now	W	Int	250
total power usage of the household in kWh up to now	power	today	kWh	Int	1687
current solar power production	solar	now	W	Int	132
total solar power production up to now today	solar	today	KwH	Int	5634
currently exported power into the grid	grid	exportNow	W	Int	0
currently imported power from the grid	grid	importNow	W	Int	118
temperature in the home office	office	Temp	℃	Int	14
humidity in the home office	office	Hum	%	Int	61
relay status of the sonoffth10 device	office	Relay1	on/off	String	off
temperature in the storage room	storage	Temp	℃	Int	14
humidity in the storage room	storage	Hum	%	Int	61
relay status of the sonoffth10 device	storage	Relay1	on/off	String	off
temperature in the cellar	cellar	Temp	℃	Int	15
humidity in the cellar	cellar	Hum	%	Int	98
relay status of the sonoffth10 device	cellar	Relay1	on/off	String	off
temperature in the living room 2nd floor	living	Temp	℃	Int	21
humidity in the kiving room 2nd floor	living	Hum	%	Int	62
relay status of the sonoffth10 device	living	Relay1	on/off	String	off
temperature in the parents sleeping room	parents	Temp	℃	Int	18
humidity in the partens sleeping room	parents	Hum	%	Int	64
relay status of the sonoffth10 device	parents	Relay1	on/off	String	off
temperature in the woodshep ( equals outside temp)	woodshed	Temp	℃	Int	12
humidity in the woodshep ( equals outside humidity)	woodshed	Hum	%	Int	85
relay status of the sonoffth10 device	woodshed	Relay1	on/off	String	off
open if the wardrobe doors are open	wardrobe	Doors	open/closed	String	closed
on if the wardrobe lights are on	wardrobe	lights	on/off	String	off
1 if the wardrobe lights are on	wardrobe	dlights	1/0	Int	0
Distance from the top of the tank to the water level	tank	Dist	cm	Int	124
Volume of water in the tank	tank	Volume	m3	float	27.4
on if pump is running	tank	pump	on/off	String	off
1 if pump is running	tank	dpump	1/0	Int	0
power consumption of freezer2	freezer2	Power	A	float	0.3
on if freezer2 is connected to power	freezer2	Relay1	on/off	String	on
disk partition used	db1	root	Byte	Int	45789
disk partition usage in %	db1	root-usage	%	float	3.5
disk partition used	db1	boot	Byte	Int	65784
disk partition usage in %	db1	boot-usage	%	float	5.1
disk partition used	db1	run	Byte	Int	2345675
disk partition usage in %	db1	run-usage	%	float	15.2
load average (as seen in top)	db1	load1	1s load av.	float	0.7
load average (as seen in top)	db1	load5	5s load av.	float	0.3
load average (as seen in top)	db1	load15	15s load av.	float	0.1
used memory for all processes	db1	memoryTotal	Byte	Int	3967456
free memory	db1	memoryFree	Byte	Int	1324546
memory usage in %	db1	memoryUsage	%	Int	66

msg.topic = data/'Place'/'Value-Type'
msg.payload = 'Example'

Now the question is, how do I change this data format into a influxdb compatible structure.
In the next post I send a first idea.

Urs-Eppenberger · 15 October 2021 20:36

Not everything which is sent through the MQTT broker needs to go into the influxdb. I only send data where I want to know the history, or the development over time.
All current values go into global context, so for controlling actors, I fetch data from there.
I never need to fetch data from influxdb into node-red. Only grafana uses data from influxdb.

There has been a discussion about my way of doing it which creates holes in the database. I could aggregate messages with the same measurement and the same tags.

Here is a proposed idea on how I could use measurement/tag/field of influxdb to structure the data. I'm very interested in your comments to learn, how I can improve that.

proposed data format for influxdb			msg for node-red-contrib-influxdb
measurement	tag	field	msg.measurement	msg.payload
Energy	source = power	now	Energy	[ { source = power } , { now = 250 } ]
Energy	source = solar	now	Energy	[ { source = solar } , { now = 132 } ]
Energy	source = grid	exportNow	Energy	[ { source = grid } , { exportNow = 0 } ]
Energy	source = grid	importNow	Energy	[ { source = grid } , { importNow = 118 } ]
Rooms	source = office	Temp	Rooms	[ { source = office } , { Temp = 14 } ]
Rooms	source = office	Hum	Rooms	[ { source = office } , { Hum = 61 } ]
Rooms	source = storage	Temp	Rooms	[ { source = storage } , { Temp = 14 } ]
Rooms	source = storage	Hum	Rooms	[ { source = storage } , { Hum = 61 } ]
Rooms	source = cellar	Temp	Rooms	[ { source = cellar } , { Temp = 15 } ]
Rooms	source = cellar	Hum	Rooms	[ { source = cellar } , { Hum = 98 } ]
Rooms	source = living	Temp	Rooms	[ { source = living } , { Temp = 21 } ]
Rooms	source = living	Hum	Rooms	[ { source = living } , { Hum = 62 } ]
Rooms	source = parents	Temp	Rooms	[ { source = parents } , { Temp = 18 } ]
Rooms	source = parents	Hum	Rooms	[ { source = parents } , { Hum = 64 } ]
Rooms	source = woodshed	Temp	Rooms	[ { source = woodshed } , { Temp = 12 } ]
Rooms	source = woodshed	Hum	Rooms	[ { source = woodshed } , { Hum = 85 } ]
Water	source = tank	Dist	Water	[ { source = tank } , { Dist = 124 } ]
Water	source = tank	Volume	Water	[ { source = tank } , { Volume = 27.4 } ]
Water	source = tank	dpump	Water	[ { source = tank } , { dpump = 0 } ]
Energy	source = freezer2	Power	Energy	[ { source = freezer2 } , { Power = 0.3 } ]
Systems	source = db1	root	Systems	[ { source = db1 } , { root = 45789 } ]
Systems	source = db1	root-usage	Systems	[ { source = db1 } , { root-usage = 3.5 } ]
Systems	source = db1	boot	Systems	[ { source = db1 } , { boot = 65784 } ]
Systems	source = db1	boot-usage	Systems	[ { source = db1 } , { boot-usage = 5.1 } ]
Systems	source = db1	run	Systems	[ { source = db1 } , { run = 2345675 } ]
Systems	source = db1	run-usage	Systems	[ { source = db1 } , { run-usage = 15.2 } ]
Systems	source = db1	load1	Systems	[ { source = db1 } , { load1 = 0.7 } ]
Systems	source = db1	load5	Systems	[ { source = db1 } , { load5 = 0.3 } ]
Systems	source = db1	load15	Systems	[ { source = db1 } , { load15 = 0.1 } ]
Systems	source = db1	memoryTotal	Systems	[ { source = db1 } , { memoryTotal = 3967456 } ]
Systems	source = db1	memoryFree	Systems	[ { source = db1 } , { memoryFree = 1324546 } ]
Systems	source = db1	memoryUsage	Systems	[ { source = db1 } , { memoryUsage = 66 } ]

(and yes, it is { "source" = "db1"} etc, but this forum has eaten up all my "-characters.

Colin · 16 October 2021 10:42

Would it not be much simpler to send the data to influx at the point it is first available, before you split it into multiple topics.

For data containing such characters use the same method as posting logs and flows, using triple backticks. Then the forum will not mess with it.

Urs-Eppenberger · 18 October 2021 17:26

I do have to bring the incoming data into shape, add some context, before I drop it into influxdb. But I know how to do this. So yes, Colin, I promise, I will do it, really, honestly.

But there is another topic I need to figure out first:

I understood that influx-measurement is really means what it says: one measurement set of data from one sensor.
I understood too that fields are used to describe values of a measurement of a sensor.
What I'm not sure at all is on how to use the influx tags.

In my previous message I showed one way I figured I could do it. I'm really interested in the opinions of those who have done this.

Colin · 18 October 2021 20:27

Not necessarily one sensor, but a set of values that are available at the same instant.

A field represents a value of some sort, the field name tells you what it is. Temperature, humidity etc are typical field names.

Tags are different from fields in that they are not process values, but are strings (usually) that identify something about the set of values in that measurement. A typical use is to identify location. So if you were measuring temperature and humidity in a set of rooms then the fields might be temperature and humidity, a tag name might be 'room' and the tag values would be kitchen, bedroom etc. So records in the database would be something like

time       temperature     humidity      room
-----      ------------   ----------   --------
....        20.5           65            "kitchen"
....        25.3           68            "bedroom"

In the query to access the database one can then say things like
select * from the_measurement where room="kitchen"
to get all the kitchen values.
In grafana one can even have automatically repeating panels where you just have to define the panel for one room and then it will automatically show the panels for all the other rooms.
Tags are indexed in the database which means one can select on tag values very efficiently.

Note that all the records in the measurement have the same fields, if the fields are different then they should be in different measurements. That means your table above has some problems. For example you seem to have three different fields in the Energy measurement, now, exportNow and importNow. However, if those are all energy measurements then perhaps you just need a field called value and a tag called source. Alternatively if all those values are read at the same time then put it in three fields in one measurement, with field names power, solar and grid, for example, with no need for tags at all.

Urs-Eppenberger · 19 October 2021 05:03

Many thanks. This is exactly what I was looking for.
I will use this advise to redraw the data schema. This is the part where my head hurts the most.
And after that there will be time for drawing wires with node-red. This is the fun part.

ozpos · 10 December 2021 12:12

Hi @Colin, sorry to enter this topic at such a late stage. I am asking you directly because you appear to have a better handle on infuxdb schema than most. I have spent a couple of days reviewing lots of posts re. schema and find myself a little confused over table (measurement) organisation and wondered if you or any one else can clarify the subject of schema design a little more. I have been using Influxdb + telegraf + grafana for a while but would like to get it right this time round.

I am trying to redesign from scratch a suitable influxdb schema for DIY home IoT implementation.

Given a list of all types of measurement (temp, humidity, battery, light level, light colour, light status, occupancy, barometric pressure, water level, device_id, location, hostname, CPU_load, memory etc. etc.). What would be the ideal schema for a performant and flexible layout ?

The sort of things I am not sure of are:

How many tables (measurements) ?
Which items should be tags ?
Which should be fields ?

Also it would be interesting/instructive to see the data structure used to create the default system monitoring dashboard that is automatically generated by grafana.

P.S. I tried to stay awake through all of the influxdb video [Training] InfluxDB Performance Tuning and Schema Design - YouTube but it lacked enough real IoT schema examples for me to make any more progress. One thing they say is design your queries first which is OK if you know in advance all the queries you may need. Unfortunately I cannot envisage all of these up front.

Colin · 10 December 2021 13:49

I think all or those are covered in the post you replied to, and in previous posts in the thread.

Please read my posts again carefully, then if necessary come back and ask for clarification.
Note that I am not an expert in influx, this is just my take on it.

ozpos · 10 December 2021 14:13

Hi @Colin thank you for your response. I found more replies from yourself and just assumed that you were an expert on this.

I have read many posts including this one but still have difficulty understanding when to create multiple measurements for example.

Also some where a point was raised against putting the device_id as a tag because the time series is more about continuous data analysis and should not be interrupted by a change in device_id when the most important thing is the temperature at that location rather than which device originated it.

Where are the lines drawn between which items should have their own measurement and which items should share the same measurement ?

Clearly one would not be measuring barometric pressure in every room but care should be taken not to have a table full of null values.

Can you offer any further help or insight other that go back and read it all again ?

Best regards, oz

Colin · 10 December 2021 15:17

If you mean the id of the device measuring the temperature, then it depends what you want to measure. If you want to measure the temperature of a device wherever you put it (a portable device maybe) then use a tag defining the device. If you want to measure the temperature of a room then use the room name as the tag, then if the device breaks and you replace it you will still be measuring the bedroom (or wherever) temperature

As said in earlier posts I generally decide this based on whether the data are available at the same instant. So if you have a sensor measuring room humidity and another measuring room temperature then put them in separate Measurements, but if you have one sensor measuring measuring both then put them in one measurement. However, if you only wanted to store humidity every 10 mins but temperature every 1 minute then they would obviously go in separate measurements.

Exactly. That should be in a separate measurement.

I recommend starting by working out all the fields and tags, then define the Measurements.

ozpos · 10 December 2021 16:05

Hi @Colin, thank you very much for your response.

Can I clarify that when you use the term 'measurement' you are referring to a separate influxdb table (as referenced by the FROM field in a grafana query ?

And if so does this mean that a typical home IoT system would likely have lots of measurements like 'humidity', 'power', 'barometric pressure', 'host_info', 'lighting', 'heating' etc. as a direct preference to having ANY null values ?

Finally (I hope), which is the preferred choice:

Create a separate 'measurement' for humidity and temperature (because the sampling periods are different) or
Align the sampling periods and use the same measurement or
Cache all data in global/flow and write to influx in one go (assuming more rapidly sampled data is just for the ui and that these intermediate values can be lost).

Is all of this a kind of db schema normalisation as found in conventional 3rd normal form relational databases ?

Colin · 10 December 2021 16:26

I mean Influx Measurements. Influx doesn't have tables, but Measurements are similar to tables in, for example, mysql.

For 1, 2 or 3 it is up to you. Option 2 is just a subset of Option 3. Caching will make db access a little more efficient, at the expense of complexity (and hence bug likelihood) in node-red. Personally I would generally go for the simplest, which is option 1.

Remember that you cannot cache across tags, so if you have a tag for room, then you would have to cache the data separately for each room. Caching can get complex and I don't like complexity where it is not required.

If you are running on any sensibly powered machine then home automation is a trivial task for influx. It is designed for handling vast amounts of data at great speed. There is something in the docs somewhere that talks about the power of machine you need for various tasks, and a home automation system is way off the bottom of the lowest task that they talk about. The result is that it doesn't really matter how you organise the data, it won't make any difference to performance.
Don't try and run it with an SD card as the storage though.

Topic		Replies	Views
Influxdb suggested schema General	22	6075	17 January 2019
Store data in InfluxDB show in Chronograf General database	34	860	27 July 2022
NodeRed to InfluxDB, unable to set 'tags' General	20	534	15 March 2023
How to effectively use InfluxDB FAQs database	23	2915	15 March 2021
Tasmota, JSON function and split Node General function-node , mqtt	43	871	12 April 2023

Data entry into influxdb

Related topics