CPU spikes on Raspberry 4 every hour

hi floks,
I ve a problem with lag spikes / high CPU usage which accrue about every hour or so. (see screenshot) on my raspi 4.
image
I ve some pretty big flows for my switches (18 x ~400 nodes).
But the CPU runes nicely and smooth most time. I did not trigger events that couse these flows to run and create CPU load when the spikes appear. But when I disable the half of them (for tests) it seems to reduce the spikes (see smaller spike in the middle of the screenshot). So I guess the problem comes from them. The question is: why does these flows causes the spikes while they are not running/doing stuff? are there internal tasks (or so) coming from node red that causes this ?
Thx!

1 Like

Are the spikes definitely coming from node red?

What is the scale on the graph?

1 Like

I am pretty sure they does since I tested around with disabling flows and watch the impact of the spikes.

I use "loadavg" node (node-red-contrib-os) to measure the load (node-red-contrib-os).

Description:
The load average is a measure of system activity, calculated by the operating system and expressed as a fractional number. As a rule of thumb, the load average should ideally be less than the number of logical CPUs in the system.

1 Like

Even if pretty sure, I'd check the logs for more info.

1 Like

I ll will set log level to trace mode and post the log here.

Also run top or htop and see what it says.

How often are you updating the chart?

1 Like

I looked at the log (trace mode) while the CPU spikes and there was nothing happening. just the update of the CPU temperature "vcgencmd measure_temp" every 30 seconds.

The chart updates or nothing I did in all the flows matches these CPU spikes.
There are different charts active when different events are triggered. Its hard to say maybe there is one chart update per second max. BUT They are not active nor viewed all together. At the page where I measure CPU load are only three active charts at a time active.

I ll try htop again. Is there a way to let htop create useful log ? hmmmppff

Well if you are already running InfluxDB and Grafana, I'd recommend also running Telegraf. It isn't a big overhead and it will easily capture lots of useful device performance information into its db. You can then monitor it using Grafana.

Otherwise, since the spike is so regular, just have a gander at htop around the time it is happening. Check whether it really is a service spike and if so what is the service causing it. Also check for spikes in SWAP usage as that can easily trigger a CPU spike when running of an SD-Card.

1 Like

If you are using influxdb it's possible it will generate the spikes due the constant query's of the system db.
To reduce the cpu (spike) load by influxdb significant you can disable the database tables internal used by influxdb.

Edit /etc/influxdb/influxdb.conf and find the section [monitor]
Uncomment the line store-enabled and give it the value false.

[monitor]
  # Whether to record statistics internally.
    store-enabled = false

  # The destination database for recorded statistics
  # store-database = "_internal"

  # The interval at which to record statistics
  # store-interval = "10s"

Restart infuxdb sudo service influxdb restart

Influxdb will now stop using the tables for internal use.
Note from the influxdb website:

Set to false to disable recording statistics internally. If set to false it will make it substantially more difficult to diagnose issues with your installation.

It possibly could be influxdb, but first @WhiteLion should inspect the htop output when it is spiking in order to get a better idea about what is going on.

1 Like

I am still trying to find out. htop paints a different picture than "loadavg" node (node-red-contrib-os). There I could see only a spike for about 1 second. That is caused by an arp request -> pinging every 10 seconds for present detection. The curve loadavg draws is not comprehensible. In my theory could this be the case:

  • Every 10 seconds cpu spike by arp - request.
  • Every 60 seconds cpu load measure by loadavg.
    At some point the time where these events meet (maybe by a delay of the trigger (both use an injection node) and then the curve of high load will be drawn.

But I could be totally wrong with that.

I dont use influxdb. I use MariaDB for only two values. all the rest of my config is a json object saved to SD card every 10 minutes.

If htop is not showing significant cpu usage then you don't have a problem.

2 Likes

Not surprising to be honest. A node is a long way away from the actual OS - at the wrong end of a complex platform running over a high-level language interpreter.

Where I need to run things like ARP checks, I usually run a dedicated script via CRON and feed the data into Node-RED. I actually run an NMAP scan every 15 minutes to see what is on the network. At the end of the script it calls a web endpoint which is defined in Node-RED and a flow aggregates the data into a table.

1 Like

thank you very much for your help and for taking care of my problems.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.