In a last attempt I reread the thread to check if I followed all the hints.
There was a remaining one: set the influxd log level to 'debug'
I just did this and watched what happens. There is a problem with a .tsm file. The following entries repeat itself continuously in the log:
Oct 12 20:57:03 db1 influxd[5482]: ts=2021-10-12T18:57:03.867776Z lvl=info msg="TSM compaction (start)" log_id=0X9TZQ_G000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0X9TgcUl000 op_name=tsm1_compact_group db_shard_id=150 op_event=start
Oct 12 20:57:03 db1 influxd[5482]: ts=2021-10-12T18:57:03.868965Z lvl=info msg="Beginning compaction" log_id=0X9TZQ_G000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0X9TgcUl000 op_name=tsm1_compact_group db_shard_id=150 tsm1_files_n=5
Oct 12 20:57:03 db1 influxd[5482]: ts=2021-10-12T18:57:03.869817Z lvl=info msg="Compacting file" log_id=0X9TZQ_G000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0X9TgcUl000 op_name=tsm1_compact_group db_shard_id=150 tsm1_index=0 tsm1_file=/var/lib/influxdb/data/home/autogen/150/000000001-000000001.tsm
Oct 12 20:57:03 db1 influxd[5482]: ts=2021-10-12T18:57:03.870638Z lvl=info msg="Compacting file" log_id=0X9TZQ_G000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0X9TgcUl000 op_name=tsm1_compact_group db_shard_id=150 tsm1_index=1 tsm1_file=/var/lib/influxdb/data/home/autogen/150/000000002-000000001.tsm
Oct 12 20:57:03 db1 influxd[5482]: ts=2021-10-12T18:57:03.871440Z lvl=info msg="Compacting file" log_id=0X9TZQ_G000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0X9TgcUl000 op_name=tsm1_compact_group db_shard_id=150 tsm1_index=2 tsm1_file=/var/lib/influxdb/data/home/autogen/150/000000003-000000001.tsm
Oct 12 20:57:03 db1 influxd[5482]: ts=2021-10-12T18:57:03.872261Z lvl=info msg="Compacting file" log_id=0X9TZQ_G000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0X9TgcUl000 op_name=tsm1_compact_group db_shard_id=150 tsm1_index=3 tsm1_file=/var/lib/influxdb/data/home/autogen/150/000000004-000000001.tsm
Oct 12 20:57:03 db1 influxd[5482]: ts=2021-10-12T18:57:03.873045Z lvl=info msg="Compacting file" log_id=0X9TZQ_G000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0X9TgcUl000 op_name=tsm1_compact_group db_shard_id=150 tsm1_index=4 tsm1_file=/var/lib/influxdb/data/home/autogen/150/000000005-000000001.tsm
Oct 12 20:57:09 db1 influxd[5482]: ts=2021-10-12T18:57:09.380056Z lvl=info msg="Error replacing new TSM files" log_id=0X9TZQ_G000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0X9TgcUl000 op_name=tsm1_compact_group db_shard_id=150 error="cannot allocate memory"
Oct 12 20:57:10 db1 influxd[5482]: ts=2021-10-12T18:57:10.380492Z lvl=info msg="TSM compaction (end)" log_id=0X9TZQ_G000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0X9TgcUl000 op_name=tsm1_compact_group db_shard_id=150 op_event=end op_elapsed=6512.731ms
I stopped the influxdb.
Then I ran 'sudo influx_inspect verify -dir /var/lib/influxdb' , but the 5 .tsm files seemed 'healthy'.
Then I headed over to /var/lib/influxdb/data/home/autogen/150 and just deleted all the files there and the directory too. I do not care if I miss a few data points. desperate times, desperate measures.
I restarted influxdb, top and iotop look perfect.
Then I added all my data sources from within node-red in one go, just to see how this goes.
Result so far:
- influxd as process shows rarely up in 'top'
- 'sudo iotop' shows almost no write activity
- data show up in grafana as it should
- no more errors in the log about a problem with compacting a .tsm file
Therefore I set the log level back from debug to warn.
Let's declare this exercise a success and head for the pillow.
Many thanks for all the assistance I got. I learned a lot. Not being alone with a problem like this means a lot to me. I hope I can give something forward at another opportunity.
Kind regards,
Urs.