InfluxDB on a Raspberry Pi

I did mention that OP should look into details on what caused high CPU usage. If the root cause is not found, then it may come back.

But as I said, a simple way is to re-install and have a clean system. This can be done using another micro SD card or SSD drive.

Re-import the data may not cause the CPU spike (in this case we know that data is good), or may cause the CPU spike (there is corruption of data. In this case, either discard the data, or look into each measurements).

I agree that re-importing the data might fix it. But the logs should be looked at first, otherwise the OP will not have any confidence that it will happen again next week. The logs may not help, but they may.

It is always best to try to understand a problem before making random attempts to fix it.

2 Likes

Hold on, I am confused now. @Urs-Eppenberger you said (in the other thread) that you have re-installed the original version of influxdb. I would have expected that to restore the original config file. If that is the case then you should be back with the default logging, unless you have disabled logging external to influx by sending syslog to a null file or something.

Colin, this is what I would have expected too.
I checked and there was still the old influxdb.conf file with the changes I made to kill the internal monitoring and the logging. I then checked if the upgrade really worked, and it did:

pi@db1:~ $ influx -version
InfluxDB shell version: 1.8.6

I will reenable logging as per Colins hint in this topic.

Since Sunday Morning 06:10 the situation has gotten even worse. And I can assure you, I have been in bed and without touching anything on that Raspberry Pi.

What follows are a screenshot of the top command which shows, that is is only one process, influxd, which his the cause for the hight CPU load. Just compare the 'load average' with the %CPU of the influxd process. Please note that there is still plenty of memory left.

And here is how I collect system data for the Pi4 using node-red, every 5 minutes, nothing fancy at all:
Screenshot 2021-10-12 at 08.39.38
And this is how it looks in grafana. Note the CPU load increase at 06:10. There is no increase in used memory. It is just this one process using more ressources.


I want to get to the bottom of this.
Frist step is to reenable logging as mentioned above.

By googling I found another check I just faithfully did. There seems no problem with the .tsm files. Which means, that if it comes to the point of reinstalling influxdb, it seems safe to reinstall the data too.

pi@db1:~ $ sudo influx_inspect verify -dir /var/lib/influxdb
/var/lib/influxdb/data/home/autogen/10/000000006-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/101/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/109/000000006-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/11/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/117/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/125/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/133/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/135/000000049-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/14/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/142/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/15/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/150/000000001-000000001.tsm: healthy
/var/lib/influxdb/data/home/autogen/150/000000002-000000001.tsm: healthy
/var/lib/influxdb/data/home/autogen/150/000000003-000000001.tsm: healthy
/var/lib/influxdb/data/home/autogen/150/000000004-000000001.tsm: healthy
/var/lib/influxdb/data/home/autogen/150/000000005-000000001.tsm: healthy
/var/lib/influxdb/data/home/autogen/158/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/16/000000006-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/17/000000006-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/174/000000006-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/182/000000004-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/188/000000002-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/19/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/190/000000003-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/192/000000003-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/2/000000001-000000001.tsm: healthy
/var/lib/influxdb/data/home/autogen/20/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/21/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/23/000000003-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/25/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/29/000000001-000000001.tsm: healthy
/var/lib/influxdb/data/home/autogen/3/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/37/000000001-000000001.tsm: healthy
/var/lib/influxdb/data/home/autogen/4/000000006-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/5/000000004-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/53/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/61/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/69/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/7/000000004-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/77/000000006-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/8/000000004-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/85/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/9/000000006-000000002.tsm: healthy
/var/lib/influxdb/data/home/autogen/93/000000005-000000002.tsm: healthy
/var/lib/influxdb/data/telegraf/autogen/185/000000003-000000002.tsm: healthy
/var/lib/influxdb/data/telegraf/autogen/189/000000007-000000002.tsm: healthy
/var/lib/influxdb/data/telegraf/autogen/191/000000007-000000002.tsm: healthy
/var/lib/influxdb/data/telegraf/autogen/193/000000008-000000002.tsm: healthy
/var/lib/influxdb/data/telegraf/autogen/195/000000001-000000001.tsm: healthy
Broken Blocks: 0 / 268603, in 9.609413208s

Re-enable system logging too, there may be problems with the disc, for example.

Now things get really strange.
I never disabled system logging (wouldn't know how to do it)
I enabled influxd logging. There is no log file appearing in /var/log/influxdb/
But /var/log/syslog and /var/log/daemon.log contain now one long line per node-red message sent to influxdb.
Here are a few for your enjoyment:

Oct 12 12:03:11 db1 influxd[3300]: [httpd] 127.0.0.1 - root [12/Oct/2021:12:03:11 +0200] "POST /write?db=home&p=%5BREDACTED%5D&precision=s&rp=&u=root HTTP/1.1 " 204 0 "-" "-" 9b805d36-2b43-11ec-9aa1-dca63274c164 8634
Oct 12 12:03:14 db1 influxd[3300]: [httpd] 127.0.0.1 - root [12/Oct/2021:12:03:14 +0200] "POST /write?db=home&p=%5BREDACTED%5D&precision=s&rp=&u=root HTTP/1.1 " 204 0 "-" "-" 9d33c2d7-2b43-11ec-9aa3-dca63274c164 3451
Oct 12 12:03:19 db1 influxd[3300]: [httpd] 127.0.0.1 - root [12/Oct/2021:12:03:19 +0200] "POST /write?db=home&p=%5BREDACTED%5D&precision=s&rp=&u=root HTTP/1.1 " 204 0 "-" "-" a0a1fa6c-2b43-11ec-9aa4-dca63274c164 4258
Oct 12 12:03:19 db1 influxd[3300]: [httpd] 127.0.0.1 - root [12/Oct/2021:12:03:19 +0200] "POST /write?db=home&p=%5BREDACTED%5D&precision=s&rp=&u=root HTTP/1.1 " 204 0 "-" "-" a0a21aab-2b43-11ec-9aa5-dca63274c164 7196

I will switch this off in a few hours, because these entries are pointless.
But what is really really strange is the following. At exact the time when I enabled logging in the influxdb.conf file and used 'systemctl restart influxd.service' the CPU went down and the memory usage went up.


With more logging I would expect more CPU load. But no. So this leaves me puzzled.

The CPU has gone down, but it is still ridiculously large. What does top look like now? Particularly the memory usage.

You said a couple of posts ago that you only write to the database every 5 minutes, but the bit of log you posted shows writing every few seconds.

Also run sudo iotop and see if there are any significant numbers in the IO column. That will tell you whether there is some sort of clog up writing to disc.

Also I think the output from df -h might be interesting.

What are the blue nodes in the flow you posted?

Here is a screenshot from top. It shows the hugh CPU load for the influxd process. There is a lot of free memory, no swap.

The input load is 40-60 messages per minute.

I did change something today, which seems to have no effect so far but needed correction. In the configuration of the influxdb node I changed the precision from ns (default) to seconds. This change is in effect, as can be seen in the log.

Here is the df -h which shows that there is ample space left.

pi@db1:/var/log $ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root       110G  6.3G   98G   7% /
devtmpfs        1.7G     0  1.7G   0% /dev
tmpfs           1.9G     0  1.9G   0% /dev/shm
tmpfs           1.9G   17M  1.9G   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/sda1       253M   50M  203M  20% /boot
tmpfs           379M     0  379M   0% /run/user/1000

The blue nodes from my system measurement flow:
node-red-contrib-os

I had to install iotop first.
The frist screenshot is the thread view, using 10s intervals:


The second screenshot is the process view, again using 10s intervals:

I do not have any experience with iotop. But the write activity seems quite heavy, if M/s means MByte/s.

An immediate problem is that you are writing to InfluxDB separately.
Since the messages come from the same source (operating system node), you should group the messages together and write to InfluxDB at one time.
The writing load to the database can be immediately reduced by ~75%.

BTW, is this the only flow that writes to InfluxDB?

1 Like

How is that when you show it triggering only once every 5 mins?

9MB/sec is ridiculously high. Influx is doing something very odd. My influx system which has a similar amount of data shows up to about 10K/s for influx.

Are you sure you have re-enabled all the influx logging? Did you restore the config file to the default installed version?

Edit: There are bursts up to about about 250K/s on Influx writes, but most of the time it is small.

No, this is only the system monitoring flow. There are other data inputs from the sensors.
The system monitoring is running once every five minutes. With all respect, I'm not going to optimise this.

In the [logging] section in the config file, what have you got level set to? It should be "info". However, if it already is then perhaps set it to debug and see what you get. Probably complete data overload :slight_smile:

In the [continuous_queries] section try uncommenting
# enabled = true
and set it to
enabled = false
I believe that you aren't running any so that should make no difference.

At the trivial data rate of about 1/sec the organisation of the data won't make any significant difference. If the data rate were 100 times that then it might be worth worrying about.
We are talking here about clogging up two or more cores, there is something seriously wrong.

Actually, having said that, I can imagine db schemas that could be extremely write intensive even at that rate.

@Urs-Eppenberger can you describe exactly what the db organisation is for the sensors? That is in terms of Measurement names, Tag names and Field names please.

So you have other flows writing to the database too?

Disconnect those flows to the database, only keep the system flow and see what happens. You need to find out which flows caused the problem. This can be done one by one.

About the database schema. There is probably an easy way to do this with influx, but I have no clue.

I connected MQTT Explorer (a really excellent tool) to the broker. I fetch the data/ topic tree and send a subset of the broker messages to influxdb. But this gives you the idea about the schema:

For example this is the last sensor data from the Temp/Hum Sensor in the cellar:
Topic: "data/cellar/Temp"
msg.payload: 14.8

This is then converted to the following structure which his sent to the node-red-contrib-influxdb out node:
msg.measurement: "cellar"
msg.payload : { "Temp" : 14.8 }

I do not use influxdb tags at all (not used in the beginner tutorials, therefore I ignored this topic).

From the above screenshot you get the idea of the complexity/simplicity of my scheme:

measurement: about 13 different values, they refer to places around my home.
values-types: usually about 2 - 3 end up in the influxdb, the rest is kept inside node-red

Does this give a clear enough picture?
Kind regards,
Urs.

Since it is not really obvious, which one might be the culprit, I will disconnect everything from within node-red.
Just to see if my influxdb stops the writing activity.
If it does, then I will add the data sources one by one.
If it does not stop writing, then I guess we have something very strange at hand.
We'll see.

I've disabled all data input from node-red into the influxdb.
There is no reduction noticeable in CPU load or Disk write:
top:


iotop: with options p for processes

Not a single byte is sent from node-red into the database.
I am on the wrong forum here. But as long as you bear with me I'd like to continue here.

There might be another data source, but since I'm not knowledgeable in this area, there is a very very low chance that I did something which sends data to influxdb.
There is only the infinite retention policy (default setting), and no continuous query (I disabled it nevertheless in the config file just to make sure).

There is nothing in the system logs showing up (I use 'sudo journalctl -f' to watch this)

I will let the system run over night to see if it cools down on its own.
If not, then the only way out here is to drop the database from within influx and start from scratch.