Data entry into influxdb

Hello experts
I'm using node-red to fetch sensor data from an MQTT broker. I modify the data and send it again to the broker, to the global context and some of it to influxdb.
I use this data model on the broker:
msg.topic: data/'place'/'value-type'
msg.payload: 'value'

'place' : cellar, woodshed, office, printer1, ...
'value-type' : Temp, Hum, Light, Door, Yellow, ...

The sensor data comes from various sensors/data-sources and is then reworked with node-red according the above data model.

To send data to influxdb, I take the above message and rearrange it according to a beginners guide from the net:
msg.measurement : 'place'
msg.payload: {'value-type' : 'value'}

This means that I create an entry in influxdb for each data point. I heard this is not optimal. Are there any hints on how I can improve this, if it makes sense at all?

  • group messages for the same 'place' / msg.measurement
  • use another data model for influxdb with tags (I have absolutely no clue what tags are, btw)
  • other hints?

Kind regards,
Urs.

Generally data which are collected in one operation (and so have the same timestamp) should be written to a single measurement. If data are collected in multiple operations (and so have different timestamps) then they should go to separate measurements.

So, for example, if you have a temperature and humidity sensor that gives you both in one operation then put them both in one measurement, otherwise put them in different measurements.

1 Like

I see what you mean. I will have to think about it. I would have to recombine the messages after I have split and normalised the data. The msg.parts fields are still there.

It might be better, and almost certainly simpler, to send it straight to influx at the point of measurement input rather than after splitting it up.

On the other hand it may be of no consequence which way you do it. How many values are there and how often are you writing them? If you did merge them where appropriate how many measurements would there then be?

Many thanks for taking my hand and leading me forward.

In the current setup there are about 40-60 messages per minute going into the influxdb. I placed a speed node in front of the DB to get a feel about it:

All messages are a single value, value-type, place triplet. If a data source sends several values in one batch, I split them up, reform them into the standardised value-types and places.

How many different process variables (measurements) is that?

What hardware are you running on. If it is a pi then where is the database (SD card USB SSD, etc)?

Do you mean value, field name and measurement name?

Edit: or possibly value, tag value and measurement name, or even value, tag value, and another tag value?

I watched the webinar from influxdb about Schema Design for IoT
Should have done that before starting to send data into my influxdb.

Main learning was about what to put into tags.
So far I did not use those, but used measurement instead.
Anyway, now I have some material to rethink my data schema.

Here is a table and an explanation on what measurement data flows through my MQTT broker. This data is collected from various sensors and data sources, then normalised and then sent to the broker.

Data description / Remarks Place Value-Type Unit Data type Example
power usage of the household in W power now W Int 250
total power usage of the household in kWh up to now power today kWh Int 1687
current solar power production solar now W Int 132
total solar power production up to now today solar today KwH Int 5634
currently exported power into the grid grid exportNow W Int 0
currently imported power from the grid grid importNow W Int 118
temperature in the home office office Temp Int 14
humidity in the home office office Hum % Int 61
relay status of the sonoffth10 device office Relay1 on/off String off
temperature in the storage room storage Temp Int 14
humidity in the storage room storage Hum % Int 61
relay status of the sonoffth10 device storage Relay1 on/off String off
temperature in the cellar cellar Temp Int 15
humidity in the cellar cellar Hum % Int 98
relay status of the sonoffth10 device cellar Relay1 on/off String off
temperature in the living room 2nd floor living Temp Int 21
humidity in the kiving room 2nd floor living Hum % Int 62
relay status of the sonoffth10 device living Relay1 on/off String off
temperature in the parents sleeping room parents Temp Int 18
humidity in the partens sleeping room parents Hum % Int 64
relay status of the sonoffth10 device parents Relay1 on/off String off
temperature in the woodshep ( equals outside temp) woodshed Temp Int 12
humidity in the woodshep ( equals outside humidity) woodshed Hum % Int 85
relay status of the sonoffth10 device woodshed Relay1 on/off String off
open if the wardrobe doors are open wardrobe Doors open/closed String closed
on if the wardrobe lights are on wardrobe lights on/off String off
1 if the wardrobe lights are on wardrobe dlights 1/0 Int 0
Distance from the top of the tank to the water level tank Dist cm Int 124
Volume of water in the tank tank Volume m3 float 27.4
on if pump is running tank pump on/off String off
1 if pump is running tank dpump 1/0 Int 0
power consumption of freezer2 freezer2 Power A float 0.3
on if freezer2 is connected to power freezer2 Relay1 on/off String on
disk partition used db1 root Byte Int 45789
disk partition usage in % db1 root-usage % float 3.5
disk partition used db1 boot Byte Int 65784
disk partition usage in % db1 boot-usage % float 5.1
disk partition used db1 run Byte Int 2345675
disk partition usage in % db1 run-usage % float 15.2
load average (as seen in top) db1 load1 1s load av. float 0.7
load average (as seen in top) db1 load5 5s load av. float 0.3
load average (as seen in top) db1 load15 15s load av. float 0.1
used memory for all processes db1 memoryTotal Byte Int 3967456
free memory db1 memoryFree Byte Int 1324546
memory usage in % db1 memoryUsage % Int 66

msg.topic = data/'Place'/'Value-Type'
msg.payload = 'Example'

Now the question is, how do I change this data format into a influxdb compatible structure.
In the next post I send a first idea.

Not everything which is sent through the MQTT broker needs to go into the influxdb. I only send data where I want to know the history, or the development over time.
All current values go into global context, so for controlling actors, I fetch data from there.
I never need to fetch data from influxdb into node-red. Only grafana uses data from influxdb.

There has been a discussion about my way of doing it which creates holes in the database. I could aggregate messages with the same measurement and the same tags.

Here is a proposed idea on how I could use measurement/tag/field of influxdb to structure the data. I'm very interested in your comments to learn, how I can improve that.

proposed data format for influxdb msg for node-red-contrib-influxdb
measurement tag field msg.measurement msg.payload
Energy source = power now Energy [ { source = power } , { now = 250 } ]
Energy source = solar now Energy [ { source = solar } , { now = 132 } ]
Energy source = grid exportNow Energy [ { source = grid } , { exportNow = 0 } ]
Energy source = grid importNow Energy [ { source = grid } , { importNow = 118 } ]
Rooms source = office Temp Rooms [ { source = office } , { Temp = 14 } ]
Rooms source = office Hum Rooms [ { source = office } , { Hum = 61 } ]
Rooms source = storage Temp Rooms [ { source = storage } , { Temp = 14 } ]
Rooms source = storage Hum Rooms [ { source = storage } , { Hum = 61 } ]
Rooms source = cellar Temp Rooms [ { source = cellar } , { Temp = 15 } ]
Rooms source = cellar Hum Rooms [ { source = cellar } , { Hum = 98 } ]
Rooms source = living Temp Rooms [ { source = living } , { Temp = 21 } ]
Rooms source = living Hum Rooms [ { source = living } , { Hum = 62 } ]
Rooms source = parents Temp Rooms [ { source = parents } , { Temp = 18 } ]
Rooms source = parents Hum Rooms [ { source = parents } , { Hum = 64 } ]
Rooms source = woodshed Temp Rooms [ { source = woodshed } , { Temp = 12 } ]
Rooms source = woodshed Hum Rooms [ { source = woodshed } , { Hum = 85 } ]
Water source = tank Dist Water [ { source = tank } , { Dist = 124 } ]
Water source = tank Volume Water [ { source = tank } , { Volume = 27.4 } ]
Water source = tank dpump Water [ { source = tank } , { dpump = 0 } ]
Energy source = freezer2 Power Energy [ { source = freezer2 } , { Power = 0.3 } ]
Systems source = db1 root Systems [ { source = db1 } , { root = 45789 } ]
Systems source = db1 root-usage Systems [ { source = db1 } , { root-usage = 3.5 } ]
Systems source = db1 boot Systems [ { source = db1 } , { boot = 65784 } ]
Systems source = db1 boot-usage Systems [ { source = db1 } , { boot-usage = 5.1 } ]
Systems source = db1 run Systems [ { source = db1 } , { run = 2345675 } ]
Systems source = db1 run-usage Systems [ { source = db1 } , { run-usage = 15.2 } ]
Systems source = db1 load1 Systems [ { source = db1 } , { load1 = 0.7 } ]
Systems source = db1 load5 Systems [ { source = db1 } , { load5 = 0.3 } ]
Systems source = db1 load15 Systems [ { source = db1 } , { load15 = 0.1 } ]
Systems source = db1 memoryTotal Systems [ { source = db1 } , { memoryTotal = 3967456 } ]
Systems source = db1 memoryFree Systems [ { source = db1 } , { memoryFree = 1324546 } ]
Systems source = db1 memoryUsage Systems [ { source = db1 } , { memoryUsage = 66 } ]

(and yes, it is { "source" = "db1"} etc, but this forum has eaten up all my "-characters.

Would it not be much simpler to send the data to influx at the point it is first available, before you split it into multiple topics.

For data containing such characters use the same method as posting logs and flows, using triple backticks. Then the forum will not mess with it.

I do have to bring the incoming data into shape, add some context, before I drop it into influxdb. But I know how to do this. So yes, Colin, I promise, I will do it, really, honestly.

But there is another topic I need to figure out first:

  1. I understood that influx-measurement is really means what it says: one measurement set of data from one sensor.
  2. I understood too that fields are used to describe values of a measurement of a sensor.
  3. What I'm not sure at all is on how to use the influx tags.

In my previous message I showed one way I figured I could do it. I'm really interested in the opinions of those who have done this.

Not necessarily one sensor, but a set of values that are available at the same instant.

A field represents a value of some sort, the field name tells you what it is. Temperature, humidity etc are typical field names.

Tags are different from fields in that they are not process values, but are strings (usually) that identify something about the set of values in that measurement. A typical use is to identify location. So if you were measuring temperature and humidity in a set of rooms then the fields might be temperature and humidity, a tag name might be 'room' and the tag values would be kitchen, bedroom etc. So records in the database would be something like

time       temperature     humidity      room
-----      ------------   ----------   --------
....        20.5           65            "kitchen"
....        25.3           68            "bedroom"

In the query to access the database one can then say things like
select * from the_measurement where room="kitchen"
to get all the kitchen values.
In grafana one can even have automatically repeating panels where you just have to define the panel for one room and then it will automatically show the panels for all the other rooms.
Tags are indexed in the database which means one can select on tag values very efficiently.

Note that all the records in the measurement have the same fields, if the fields are different then they should be in different measurements. That means your table above has some problems. For example you seem to have three different fields in the Energy measurement, now, exportNow and importNow. However, if those are all energy measurements then perhaps you just need a field called value and a tag called source. Alternatively if all those values are read at the same time then put it in three fields in one measurement, with field names power, solar and grid, for example, with no need for tags at all.

2 Likes

Many thanks. This is exactly what I was looking for.
I will use this advise to redraw the data schema. This is the part where my head hurts the most.
And after that there will be time for drawing wires with node-red. This is the fun part.