Confused about MQTT > InfluxDB

I'm new to InfluxDB, but it seems to me there's a bit of a conflict between Node-RED/MQTT's messaging structure and InfluxDB's "measurement" concept in that the former wants to be atomic while the latter expects multiple (related) values in the same message. Take as an example, some basic system parameters, such as load average, memory usage and disk usage. In Node-RED these are typically treated as separate messages, and MQTT recommends they are separate topics. But when I pass these values to InfluxDB I instincively want to put them in the same "table" (or "measurement"), under different "tags" and "keys". This part of my MQTT topic tree looks like this:

sentinel/nodered/status
sentinel/system/loadaverage
sentinel/system/memusage
sentinel/system/uptime
sentinel/system/diskusage
sentinel/system/temperature
sentinel/system/fan 

The "measurement" would here be "sentinel", the "tags" "part"="system", and the "key" "temperature" (for example). But this doesn't seem to be a good plan - the messages arrive at different times so have to be inserted individually, resulting in a very sparse looking database, since each "row" will always have all "columns" ("keys").

> select * from sentinel
name: sentinel
time                fan loadaverage memusage part    status   temperature
----                --- ----------- -------- ----    ------   -----------
1568071466000000000                          nodered deployed 
1568071525000000000                          system           29.9
1568071525000000000     0.42                 system           
1568071525000000000                 94.0     system           
1568071525000000000 0                        system           

Am I going down the wrong path here?

You could just feed all your MQTT feeds into one function node that updates a node context for each message and then writes all the node context values to the database each time

That would populate your database fully

1 Like

This is how my weather data is "assembled" in a change node feeding the InfluxDB node.

{
"outsideTemp":outsideTemp,
"insideTemp":insideTemp,
"seaPressure":sea,
"backLux":lux,
"outsideTempFront":$globalContext("PortTemp")
}

Where outsideTemp, insideTemp, sea, lux are message properties.

The InfluxDB nodes documentation has more complex examples:

No, this is common for a timeseries db, some of my tables are far more sparse.

It doesn't really matter since you will generally be summarising by a timespan (GROUP BY time()) so everything comes out nicely.

2 Likes

The alternative, as far as I can tell, would be to have one "measurement" (table) for each parameter. Then every table would be 100% populated, and no rows would contain any gaps. But that's probably a bad idea?

Edit: Sorry @cymplecy, but I can't see myself going down that route; too fiddly and fragile (one message being late, or missing, would prevent others from being stored), and it breaks the sequence of events.

You've not understood my suggestion :slight_smile:
I'm saying write all the fields when you receive an update on any of them

1 Like

For this kind of data you could take a look at telegraf:
https://www.influxdata.com/time-series-platform/telegraf/

I have always assumed that since this is exactly the sort of data that influx is intended for that it is optimised to minimise such overheads.

1 Like

Ah, that makes a lot more sense, thank you! Still not sure I like it though... The way I see it, an event is a change, and that's all you should need to store...

Yes, but that doesn't mean there aren't more or less efficient ways to design your schema... For example, I have no need for ms precision, so all my timestamps are rounded to the nearest second. This supposedly has a marked impact on InfluxDB performance:

Does the precision of the timestamp matter?

Yes. To maximize performance we recommend using the coarsest possible timestamp precision when writing data to InfluxDB.

Your sentinel/system/ messages are all arriving at the same time. If you subscribed to sentinel/system/# and fed the results into a join node set to wait for 6 messages and put them in an array you could insert them all at once. No database redundancy (null fields).

Well spotted! Unfortunately the timestamps are misleading in my example; they are the same because I manually triggered a read of all values, just to see how it would look in the DB. I considered editing my sample before posting but forgot - apologies for any confusion caused. In practice these are all read by individual timers, some once a minute, some every ten seconds, yet others every five minutes. So the values will typically arrive at least several seconds apart. Furthermore, I have a rate limiting node on my MQTT output, set to only pass one message every 250ms. Just because.

1 Like

Or to cope with the fact that some may be missing set the join node to send after 30 seconds (or whatever) so that it will join them into 30 second chunks.

2 Likes

The concept is growing on me! :slight_smile:

I did a lot of Database design using MSSQL Server many moons ago, so the thought of null fields is a painful experience for me :slight_smile:

1 Like

Same here! I instinctively disliked what select * from sentinel produced. But maybe my instincts are wrong in this case :slight_smile:

1 Like

Although it will only be a problem if you need something like SELECT * FROM sentinel WHERE fan = 0 AND temperature > 80, which means your fan is broken but you'll never know :slight_smile:

I understand what Simon is saying and it would make sense for a SQL db but it really doesn't for a timeseries db in my view.

Now that is interesting and I'd missed that. I may try tweaking that on my system.

But at the cost of quite a bit of processing on the Node-RED side (to ensure that you had all fields together) or quite a lot of redundancy (writing more data than you need to as per Simon's suggestion).

2 Likes

That is a very relevant comment InfluxDB is not really comparable to pure SQL databases.

2 Likes

I'm off to read up on these influx databases....

3 Likes