I'm new to InfluxDB, but it seems to me there's a bit of a conflict between Node-RED/MQTT's messaging structure and InfluxDB's "measurement" concept in that the former wants to be atomic while the latter expects multiple (related) values in the same message. Take as an example, some basic system parameters, such as load average, memory usage and disk usage. In Node-RED these are typically treated as separate messages, and MQTT recommends they are separate topics. But when I pass these values to InfluxDB I instincively want to put them in the same "table" (or "measurement"), under different "tags" and "keys". This part of my MQTT topic tree looks like this:
The "measurement" would here be "sentinel", the "tags" "part"="system", and the "key" "temperature" (for example). But this doesn't seem to be a good plan - the messages arrive at different times so have to be inserted individually, resulting in a very sparse looking database, since each "row" will always have all "columns" ("keys").
> select * from sentinel
name: sentinel
time fan loadaverage memusage part status temperature
---- --- ----------- -------- ---- ------ -----------
1568071466000000000 nodered deployed
1568071525000000000 system 29.9
1568071525000000000 0.42 system
1568071525000000000 94.0 system
1568071525000000000 0 system
You could just feed all your MQTT feeds into one function node that updates a node context for each message and then writes all the node context values to the database each time
The alternative, as far as I can tell, would be to have one "measurement" (table) for each parameter. Then every table would be 100% populated, and no rows would contain any gaps. But that's probably a bad idea?
Edit: Sorry @cymplecy, but I can't see myself going down that route; too fiddly and fragile (one message being late, or missing, would prevent others from being stored), and it breaks the sequence of events.
Ah, that makes a lot more sense, thank you! Still not sure I like it though... The way I see it, an event is a change, and that's all you should need to store...
Yes, but that doesn't mean there aren't more or less efficient ways to design your schema... For example, I have no need for ms precision, so all my timestamps are rounded to the nearest second. This supposedly has a marked impact on InfluxDB performance:
Your sentinel/system/ messages are all arriving at the same time. If you subscribed to sentinel/system/# and fed the results into a join node set to wait for 6 messages and put them in an array you could insert them all at once. No database redundancy (null fields).
Well spotted! Unfortunately the timestamps are misleading in my example; they are the same because I manually triggered a read of all values, just to see how it would look in the DB. I considered editing my sample before posting but forgot - apologies for any confusion caused. In practice these are all read by individual timers, some once a minute, some every ten seconds, yet others every five minutes. So the values will typically arrive at least several seconds apart. Furthermore, I have a rate limiting node on my MQTT output, set to only pass one message every 250ms. Just because.
Or to cope with the fact that some may be missing set the join node to send after 30 seconds (or whatever) so that it will join them into 30 second chunks.
Although it will only be a problem if you need something like SELECT * FROM sentinel WHERE fan = 0 AND temperature > 80, which means your fan is broken but you'll never know
I understand what Simon is saying and it would make sense for a SQL db but it really doesn't for a timeseries db in my view.
Now that is interesting and I'd missed that. I may try tweaking that on my system.
But at the cost of quite a bit of processing on the Node-RED side (to ensure that you had all fields together) or quite a lot of redundancy (writing more data than you need to as per Simon's suggestion).