How to effectively use InfluxDB

StefanSchmelz · 2 December 2020 16:40

Hi,
I want to runn all my sensor values and things into an influxdb.
I have looked into the documentation of node-red-contrib-influxdb and that left many questiuons open:

I want to automatically retain all measurements for 3 months, retain an hourly average/median for 3 more months and Store Weekly averages indefinitely.

What would I have to do to make this as easy as possible?
Would you rather run influxdb on you raspberry or an old laptop?
Would you rather create "tables" per measured value or per room?
Is there any new developments to integrate grafana graphs into the normal ui, or do i still have to write my own UI?
I would prefer a some type of query builder so i don't make as many misstakes. Do you know any?

Lastly I want to give a little explanation why I ask these question here altho most of them don't really are about node-red. I thought the chance of finding what I want to know here is higher than in the influxdb forum. You are up against the same difficulties as I am, so you might already have most of my noob questions figured out by now.
Thanks for your time and patience in advance!
Stefan

ozpos · 2 December 2020 17:26

Hi @StefanSchmelz

All this is OK on a pi4 and influxDb logging from a piZeroW is OK. Recently telegraph was mentioned which is also quite light-weight.

Secure the pi and Install InfluxDb and set up a secure user. Run the influxDb service. Then use a terminal to log in and create a database called HomeAutomation or whatever and set the data retention policy as you choose to have it. I think then you use these credentials within NR to connect and log into the service and then log data.

Knowing it is possible helps a bit.
PiZeroW without desktop (buster light) works as a simple data logger. A pi4 would be a better choice. A pi4 with influxDB and 32 sensors with a target chromium ui will work OK as well.
Not enough experience to answer that but it is not so easy to delete a field from a data series once written.
Not qualified to comment thou I use grafana on pc/laptop (OK on and old laptop).
Sort your schema according to the entire scope of your project. Look at examples of influxDb object structures starting with simple values at first. grafana has a built in query builder and you never need to see SQL again unless you would like to.

Perfectly doable good luck.

Colin · 2 December 2020 17:51

You haven't said what sample rate you are using. That is key as it determines the rate you are adding data into the the database. The main issue running influx on a pi is the SD card. If you have a usb hard disk then it is fine, assuming you are not storing data at a rate of hundreds per second. Personally, if you have an old laptop, then I would use that. That is exactly what I do. It has the advantage that it will cope with power outages better. My preference is to install Ubuntu there, but that is a personal preference. Every time a Pi with SD card is powered off without a clean shutdown there is a small chance the card will be corrupted, and that would mean losing all your data, though you would obviously have a backup strategy anyway.

As @ozpos indicated influx will handle the data retention and purging old data etc for you.

With influx it is a good idea to use the right words. There are no tables, the rough equivalent is Measurements. Generally speaking any values that are available at the same instant should go in the same measurement, if not available at the same instant then put them in different measurements.

You can include Grafana dashboards in an iframe in the dashboard I believe, though I have not done it. I keep them separate.

Not sure what you mean by a Query Builder.

StefanSchmelz · 2 December 2020 18:25

I get updates form my sensors every minute and the system performance data I will capture with telegraph will be sampled every 10 seconds but sent at a lower rate to save network bandwidth.

I know that influxdb does work differently, but i did not know the right terms from the top of my head.

I tried to work through a sample with retention policies and calculating mean values form the influx documentation but failed...

About 3. : I need to decide if it would be wiser to store all temperatures inside one measurement, all humanities in one measurement and so on or if I Store all sensor values from the living room in one, all from the kitchen in one and so on....
I did not decide in what exact order i would poll my sensors yet, but I plan on using a semi centralized homematic system, where i can get all sensor values for all rooms from the hub and distributed WiFi devices running tasmota.

Based on putting everything i can read at the same time into one measurement would not really be a sensible thing to do, since one measurement would contain 3/4 of the flat in one measurement while the rest is distributed over 20 little measurements containing only one sensor.....

The querry builder is the thing graphana uses so you dont have to learn the sql dialect

TotallyInformation · 2 December 2020 20:56

An InfluxDB measurement is similar to an SQL database table.
InfluxDB tags are like indexed columns in an SQL database.
InfluxDB fields are like unindexed columns in an SQL database.
InfluxDB points are similar to SQL rows.

So I have a database called domotica with the following measurements:

The environment measurement contains all of my sensor data
I have a couple of tags: location and type
- Location tags are real places such as "Kitchen"
- type tags are sensor measurement types such as "Temperature", "Humidity", "Light", etc.
I typically have only 1 field called value

Typically a tag will contain text, a field will contain a number. Try to make sure you get that right because correcting the data type of a field is a right pain.

Here are some example retention policies

You may wish to accumulate long-term data into another measurement within your database with a continuous query such as:

CREATE CONTINUOUS QUERY cq_60min ON domotica BEGIN SELECT mean(value) AS value, max(value), min(value), max(value) - min(value) AS range INTO domotica.one_week.env_daily FROM domotica.twenty_four_hours.environment GROUP BY location, type, time(1h) END

Which summarises the 1 minute sensor values from domotica.twenty_four_hours.environment which are kept in that measurement for 1 week - into hourly values in domotica.one_week.env_daily, keeping average, max and min values for each 1hr time period.

Colin · 2 December 2020 21:12

Yes it would, that is the most efficient way of storing the data in influx (I believe). Otherwise you will be writing rows to the measurement with null values for some fields, which just fills up the database with empty fields.

Note, though that if you have similar data in a number of (for example) rooms, maybe temperature and humidity measured at the same time, then you should have one measurement for all rooms and put the room name in as a tag. So each row in the measurement includes fields temperature, humidity and tag room. Then when you query for the bedroom you would use something like
select * from whatever where room = "bedroom"

StefanSchmelz · 2 December 2020 21:29

Thanks, do you use the web interface that comes with influxdb or a different one?

TotallyInformation · 2 December 2020 21:42

I used Grafana for some of that, it has an InfluxDB admin extension. I use Grafana extensively to produce nice and sometimes complex dashboards from InfluxDB as it supports it natively.

StefanSchmelz · 2 December 2020 21:42

I did not say that I would have null fields...
If I would split my data by room or measured parameter I would query all my sensors, collect them into one huge dataset inside of node-red and group them accordingly. then I would write them into the database.
This way I can structure the Data how I want.

My sense resonates with having a measurement per room and tags per sensor type. @TotallyInformation does

StefanSchmelz · 2 December 2020 21:44

I did not know about the influxdb admin thing. I will have a look at it.

TotallyInformation · 2 December 2020 21:46

Actually, that isn't what I said. I have 1 measurement called environment. Each sensor reading is input to the environment measurement with 2 tags attached. 1 for the location and the other for the type of data.

It is simple to select all temperatures for all rooms by specifying both tags. Or you can get all sensor readings for a single room.

The thing that confuses everyone about InfluxDB is that they decided to call their table structure a "measurement" when in reality that is not how most people think about things.

Here is an example that is a bit more up-to-date than the previous one. Here the measurement is called Environment_DB

And here is that query shown as an actual InfluxDB query:

From Grafana. This is the chart it produces:

Colin · 2 December 2020 22:01

In that case you are, effectively, reading them at the same time as they will all have the same timestamp, so the efficient way is to put them into one measurement. If however, you have the same data for multiple areas then use tags as I suggested. That also lets you do auto-repeating panels in grafana, for example, so a chart designed for one room can automatically show any room.

TotallyInformation · 2 December 2020 22:03

And here is an example of a grafana auto-repeating set of charts:

As you can see, I've just come out of having a shower

StefanSchmelz · 2 December 2020 22:14

I have to admit that I am a real noob concerning databases that are not MongoDB ...
The more I think about it, the more I can see the beauty in the concept of having one giant measurement and one additional with the monitoring data from all Computers.

I might have a look at more of the Grafana features and uibuilder. I'm not really a fan of iframes in the dashboard.

And thank you all very much for your help

Buckskin · 12 January 2021 15:29

I know this is a bit late but I am confused by the one_week duration of 43800 hours

TotallyInformation · 12 January 2021 19:53

43,800 hours = 1,825 days = 5 years.

It is confusing because I slightly messed up the naming of the policies

The so-called "one_week" policy is actually a 5 year policy. It keeps hourly summary data (max/min/avg for each entry) for 5 years.

On my newer db, I keep per minute data for 1 week and hourly data for 5 years.

Buckskin · 12 January 2021 20:01

Ahh. I was trying to work it out in WEEKS. I am doing my best to nick your thoughts on schemas etc and I have just started using InfluxDB to store my sensor data.

I know it is off topic but I would be interested in how you get the devicescan & network data as I have issues with 433MHz devices changing ID after a battery change

TotallyInformation · 12 January 2021 20:31

Nick away

I use this script:

#! /usr/bin/env bash
# Fast scan the local network for live devices and record
# to /tmp/nmap.xml which can be used in Node-RED
#
# To run manually:
#   sudo /home/home/nrmain/system/nmap_scan.sh
#
# To run via cron:
#   sudo crontab -e
#       01,16,31,46 * * * * /home/home/nrmain/system/nmap_scan.sh

# Run the scan
nmap -sn --oX /tmp/nmap.xml --privileged -R --system-dns --webxml 192.168.1.0/24

# Make sure ownership & ACLs on the output are secure
chown root:home /tmp/nmap.xml
chmod --silent 640 /tmp/nmap.xml

# Trigger the Node-RED update
curl --insecure -I 'https://localhost:1880/localnetscan'

Which, as you can see, uses nmap. That requires sudo to run which is why I don't run it from Node-RED but directly via roots CRON. To get the data back to Node-RED, the script saves an XML file of the data and then calls a Node-RED url - set up using http-in/-out.

The http-in node triggers a flow that reads the file, converts to json and then updates a persisted network global variable. I use the MAC address as the key. I have a uibuilder editable table web page that lets me add/edit metadata such as a description to the data.

The global data accumulates so I can see EVERYTHING that has EVER connected to my network. When it was last seen and more.

I keep a separate known_devices variable (and MQTT topic structure) for things I actually care about and may want to actively manage. That also includes devices manually connected to the server and via a 433MHz dongle. The performance data from nmap is also copied into that structure in a standardise format.

Buckskin · 13 January 2021 21:54

Should probably move this to a new topic but...

I don't know enough about http-in etc to understand how to do this bit (I got the script running because - what do know - nmap was already installed on my Pi). I get an error message curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number which I assume is because I do not have the http node setup.

TotallyInformation · 14 January 2021 00:15

Possibly but it looks like you aren't using https either?

Topic		Replies	Views
Data entry into influxdb General database	22	2192	8 February 2022
Store data in InfluxDB show in Chronograf General database	34	859	27 July 2022
Influxdb suggested schema General	22	6073	17 January 2019
Four InfluxDB node (data is very confusing and not unique) General	32	582	8 December 2022
NodeRed to InfluxDB, unable to set 'tags' General	20	528	15 March 2023

How to effectively use InfluxDB

Related topics