Confused about MQTT > InfluxDB

clickworkorange · 5 October 2019 22:36

I'm new to InfluxDB, but it seems to me there's a bit of a conflict between Node-RED/MQTT's messaging structure and InfluxDB's "measurement" concept in that the former wants to be atomic while the latter expects multiple (related) values in the same message. Take as an example, some basic system parameters, such as load average, memory usage and disk usage. In Node-RED these are typically treated as separate messages, and MQTT recommends they are separate topics. But when I pass these values to InfluxDB I instincively want to put them in the same "table" (or "measurement"), under different "tags" and "keys". This part of my MQTT topic tree looks like this:

sentinel/nodered/status
sentinel/system/loadaverage
sentinel/system/memusage
sentinel/system/uptime
sentinel/system/diskusage
sentinel/system/temperature
sentinel/system/fan

The "measurement" would here be "sentinel", the "tags" "part"="system", and the "key" "temperature" (for example). But this doesn't seem to be a good plan - the messages arrive at different times so have to be inserted individually, resulting in a very sparse looking database, since each "row" will always have all "columns" ("keys").

> select * from sentinel
name: sentinel
time                fan loadaverage memusage part    status   temperature
----                --- ----------- -------- ----    ------   -----------
1568071466000000000                          nodered deployed 
1568071525000000000                          system           29.9
1568071525000000000     0.42                 system           
1568071525000000000                 94.0     system           
1568071525000000000 0                        system

Am I going down the wrong path here?

cymplecy · 6 October 2019 08:38

You could just feed all your MQTT feeds into one function node that updates a node context for each message and then writes all the node context values to the database each time

That would populate your database fully

ghayne · 6 October 2019 08:52

This is how my weather data is "assembled" in a change node feeding the InfluxDB node.

{
"outsideTemp":outsideTemp,
"insideTemp":insideTemp,
"seaPressure":sea,
"backLux":lux,
"outsideTempFront":$globalContext("PortTemp")
}

Where outsideTemp, insideTemp, sea, lux are message properties.

The InfluxDB nodes documentation has more complex examples:

TotallyInformation · 6 October 2019 10:09

No, this is common for a timeseries db, some of my tables are far more sparse.

It doesn't really matter since you will generally be summarising by a timespan (GROUP BY time()) so everything comes out nicely.

clickworkorange · 6 October 2019 13:22

The alternative, as far as I can tell, would be to have one "measurement" (table) for each parameter. Then every table would be 100% populated, and no rows would contain any gaps. But that's probably a bad idea?

Edit: Sorry @cymplecy, but I can't see myself going down that route; too fiddly and fragile (one message being late, or missing, would prevent others from being stored), and it breaks the sequence of events.

cymplecy · 6 October 2019 14:00

You've not understood my suggestion
I'm saying write all the fields when you receive an update on any of them

moebius · 6 October 2019 14:01

For this kind of data you could take a look at telegraf:
https://www.influxdata.com/time-series-platform/telegraf/

Colin · 6 October 2019 14:06

I have always assumed that since this is exactly the sort of data that influx is intended for that it is optimised to minimise such overheads.

clickworkorange · 6 October 2019 15:06

Ah, that makes a lot more sense, thank you! Still not sure I like it though... The way I see it, an event is a change, and that's all you should need to store...

clickworkorange · 6 October 2019 15:30

Yes, but that doesn't mean there aren't more or less efficient ways to design your schema... For example, I have no need for ms precision, so all my timestamps are rounded to the nearest second. This supposedly has a marked impact on InfluxDB performance:

Does the precision of the timestamp matter?

Yes. To maximize performance we recommend using the coarsest possible timestamp precision when writing data to InfluxDB.

ghayne · 6 October 2019 15:34

Your sentinel/system/ messages are all arriving at the same time. If you subscribed to sentinel/system/# and fed the results into a join node set to wait for 6 messages and put them in an array you could insert them all at once. No database redundancy (null fields).

clickworkorange · 6 October 2019 15:41

Well spotted! Unfortunately the timestamps are misleading in my example; they are the same because I manually triggered a read of all values, just to see how it would look in the DB. I considered editing my sample before posting but forgot - apologies for any confusion caused. In practice these are all read by individual timers, some once a minute, some every ten seconds, yet others every five minutes. So the values will typically arrive at least several seconds apart. Furthermore, I have a rate limiting node on my MQTT output, set to only pass one message every 250ms. Just because.

Colin · 6 October 2019 15:54

Or to cope with the fact that some may be missing set the join node to send after 30 seconds (or whatever) so that it will join them into 30 second chunks.

clickworkorange · 6 October 2019 16:00

The concept is growing on me!

ghayne · 6 October 2019 16:32

I did a lot of Database design using MSSQL Server many moons ago, so the thought of null fields is a painful experience for me

clickworkorange · 6 October 2019 16:49

Same here! I instinctively disliked what select * from sentinel produced. But maybe my instincts are wrong in this case

ghayne · 6 October 2019 17:14

Although it will only be a problem if you need something like SELECT * FROM sentinel WHERE fan = 0 AND temperature > 80, which means your fan is broken but you'll never know

TotallyInformation · 6 October 2019 18:04

I understand what Simon is saying and it would make sense for a SQL db but it really doesn't for a timeseries db in my view.

Now that is interesting and I'd missed that. I may try tweaking that on my system.

But at the cost of quite a bit of processing on the Node-RED side (to ensure that you had all fields together) or quite a lot of redundancy (writing more data than you need to as per Simon's suggestion).

ghayne · 6 October 2019 18:14

That is a very relevant comment InfluxDB is not really comparable to pure SQL databases.

cymplecy · 6 October 2019 18:27

I'm off to read up on these influx databases....

Topic		Replies	Views
Data entry into influxdb General database	22	2196	8 February 2022
Help needed with MQTT to influx General	23	4122	11 May 2020
Advice for structuring data for MQTT and InfluxDB General	8	902	23 December 2021
Mqtt - node-red - influxdb General	24	3062	23 November 2020
Multiple topics to one influxdb mesurement General database , function-node , mqtt	13	331	27 August 2023

Confused about MQTT > InfluxDB

Does the precision of the timestamp matter?

Related topics