Advice needed on Pi CPU usage

I hadn't realised it is that infrequent. That does make it difficult.
Keep an eye on the memory that node-red is using and see if it increases day on day. If you are using an old version of node-red an nodejs then perhaps you have a memory leak that has already been fixed. Which version of each are you using? It will tell you in the node-red startup log or you can do
node -v
to get the nodejs version. The node-red version is shown at the bottom of the menu in the editor, at least it is in later versions, not sure when that started. If not then you can run
npm list -g node-red
to find out.

@Colin

node -v - v14.16.0
npm list -g node-red node-red@1.2.9

Scratching the memory banks a bit.... about a year ago, I did an upgrade to node red, I was on a version over a year old at that stage.... nube-boolean nodes stopped working due to ??moment?? being the new trend... the one that they used had fallen from favour... Around that time, things seem to start going downhill... hence the suspicion that node red had something to do with it... Sync/Async processing change possibly? Not sure, don't know enough about it...

A long story short - Managed to kludge the nube-boolean nodes into Node red manually, lock them in place by changing ownership of the files and keep going until I could learn enough to write function nodes to replace their logic that I used (about 200 or so instances on one of their nodes, the rest being slightly less, but not by much, not a small task)...

I have probably introduced plenty of bad programming practices along the way, but lets just say that desperation required longhand writing to keep things going... As I am learning, I am going back to rewrite the replaced functions, dropping them into subflows as I go along...

Unfortunately, this is taking way longer than I would like... A 60yr old brain don't grab the info quick enough any more... Its a right SOB being a 16yo in a 60yo body.... But hey!! I'll get there!! You guys have been a tremendous help!!

Ed

You must have upgraded recently, as 1.2.9 was only released last month.

@Colin

Yep... Out of desperation I have been upgrading fairly frequently of late to see how it affects cpu usage and fallover intervals.... My system is already a bit broken, anything will be an improvement I reckon...

E

OK, it was the fact that you said it was an old install made me wonder about the memory leak.
So I suggest watching the node-red memory and the system memory over a period to see if it is increasing.
I don't think I have every run a system for 90 days without a restart. We have power failures a few times a year so even if I don't intentionally reboot in that time it generally happens unintentionally.

@Colin

Since doing some code rewrites, the memory and cpu seem stable - but - I am using these to keep an eye (Dont know how accurate they are):
screenshot-192.168.0.118_1880-2021.03.28-13_52_09

The pi is on the output of my solar inverter, so long running periods should be the norm, it is backed up by an autostart generator for short outages and a couple of tonnes of big lighting plant (with low output) for longer outages....

E

Honestly, if I were you, as long as your memory use seems OK, I would use InfluxDB, Telegraf and Grafana to track your stats. Seems like a lot I know and InfluxDB does take a bit of setting up - I have mine set to only keep 7d of details for system data. Telegraf is very simple to set up as it Grafana.

The point being that Telegraf grabs the system performance details very efficiently and dumps them to InfluxDB (also very efficient). Then you use Grafana to analyse them.

Example server performance dashboard:

Example internet monitoring dashboard:

@TotallyInformation

Yeh... Agreed... That is on my longer term plan of things to do... I am looking at getting away from EMonkeyMess ultimately... Influx is there and available, just shut down at the moment...

E

Been reading this thread with intrigue and thought I would weigh in. I also run a huge flow for home automation, as well as cctv, media centre and more on the Pi4.

I totally agree that influxdb and grafana are wildly more efficient than node red for storing and displaying data, especially rendering plots of many data points.

But the main thing I wanted to suggest is Node Red in a docker container. It's pretty easy to get running, just a docker install and node red documentation provides the command to start the container. You can then just import the whole flow.

This has a huge advantage that its resources are managed and if your node red or mqtt crashes, it's just that docker that crashes and restarts, and the pi keeps running happily. In fact for home automation, I'd say this is a requirement for reliability.

If you monitor posts here will see that we regularly get issues with Docker installs due to the problems of accessing system resources from within the container. It has it's place but often it is more trouble than it is worth.

I have never known mosquitto to crash, and if node-red crashes (which is also rare once an application is into 'production') then it restarts just as the container does.

1 Like

Thanks for the thoughts... Worth looking into, but I don't think I will go that route... I am hesitant to put another software layer that will require additional resources to maintain...

The system seems more stable since I have slowed down the reporting rate of the wifi switches to the broker where I can...

Granted, turn around times have deteriorated a bit, but nothing that's unmanageable at this stage...

Add to that a bit more optimisation of the flows and clearing out some older data and it's looking pretty rosy at the moment... Have I just jinx'd it?

I get the feeling that the overall problem on my system is a "traffic" overload... Not dramatic, but just enough that mqtt/NR/Emon/et al started to fall just a bit behind current happenings, slipping a bit further back as the system runs longer and longer... Just a gut feel...... Until the "backlog" is so much that the buffering can only just cope, add another web client login and things then come to a grinding halt... I am probably talking out of my rear orifice, but hey....

Let's see what happens longer term...

Regds
Ed

I agree with Colin's comments. Docker is useful if you know Docker. If you don't, it is mostly more trouble than it is worth. It also has significant overheads of its own of course and so can add to performance issues if not careful.

Note, of course, that I am talking about amateur use here, not in an enterprise environment where other priorities apply. For home and casual use, unless you have a really complex requirement that needs encapsulating to simplify installation and management, Docker really is a set of overheads and administrative complexity that it is, in my view, best to avoid.

As Colin also says, it is extremely rare for Node-RED to crash, but it is almost unheard of for Mosquitto to crash. The only times I've ever had problems with Mosquitto are when - as has happened a few times over the last few years - an update has contained a breaking configuration change which stops the service from restarting. (Because who reads the release notes for an update!)

And in any case, if I were using Docker, I probably wouldn't put Mosquitto and Node-RED in the same container and you would still have the same issues. But worse because you may not know whether the container had a part in the crash or not.

That is the only performance issue I've ever really had on any of my Pi installations. No matter what the database engine, if you let the dataset get too large you run into swap issues which causes CPU use to go through the roof due to the relatively inefficient SD-Card interface.

Of course, there have been a couple of times where my own bad coding in Node-RED has caused a loop but I've only myself to blame for that :slight_smile:

@TotallyInformation

Do you perhaps know of a DB3/4 interface method? (eek...Dating myself here...)

Cya
E

Not used an IBM DB past DB/2 other than Lotus Notes :slight_smile: I spent more time on IMS than DB/2.

@Colin

Hey hey...

Ok, so I have been doing some cleanup over the last few days (Nothing major, just laying out the flows more methodically, no major changes as such)...

As any updates have come in, I have done them, some needing the odd system restart, on the whole, things have been running pretty smoothly...

Life being what it is, I eventually had to attend to some RW tasks and left the system to mind it's own business for a day or so...

Lo and behold, after about 36hrs unattended runtime, the cpu hogged out at 100% (Sends me a Telegram Message when it gets above a pre-set threshold)...

Fortunately, I was able to get to a pc and fire into it within a few minutes - Htop showed Node Red to be top of the pops list in the cpu usage... All 4 cores at 100 with cpu temp sitting around 50C or so, a sure sign that something was up....

I shut down the one process that I was almost certain was the problem, solpiplog - (Sorry Nuno), only to find little to no difference - Now, bear in mind, solpiplog is squirting the inverter information onto mqtt - with this base software shut down, there is only remote sonoffs reporting into the pi... Result: no change.... CPU still clocking it royally...

Restarted Solpiplog, no difference, Still 100%, pi taking strain, a bit tardy, but running what I am throwing it....

Opened a terminal, did the Node-Red-Restart.... Cpu use immediately dropped to 15% or so... All back to normal.... CPU temp started dropping as well...

Mem use over the run period went from around 23% to around 33%, slowly increasing so there could be a minor leak, but it didn't top out, even during the 100% cpu bit, no sharp spike as such... Just an immediate drop of around 8% from 33 to 25% on restarting node red... (At 1am every day I do a: " [sync; echo 1 > /proc/sys/vm/drop_caches] to drop and clear) ... this went off shortly after without a hitch after as well, dropping memory use from 25 down to 18%....

Clutching at straws here, could this be causing a problem: A "Command node" with[ free -m | grep Swap | awk '{print ($3/$2)*100}' ] in it to report swap file usage, its triggered every 5 or 10sec or so and it shoots the swap % usage to emoncms for graphing...

I did manage to grap the syslog file before it rotated out, the only thing that caught my eye was a line that said: Apr 3 23:42:18 solpiplog kernel: [213726.990422] TCP: out of memory -- consider tuning tcp_mem

But, as mentioned before, me reading the syslog is akin to a monkey reading shakespear....

Attached is a section of the pi graph, it clearly shows the cpu spike

Red is CPU/Yellow Temperature/Purple Mem/Blue Swap Use..

Regds
Ed

Node red is single threaded so can only use 1 core at a time. In that condition it will show 100%. Were you seeing a total of 400% used? In which case which other processes were running flat out?
50C is nothing unless you have a fan (or it is outside in the Arctic). Mine runs at 85C when I drive it hard, at which point it automatically slows the clock to stop overheating.
I suspect an MQTT loop. Next time run mosquitto_sub (or some other client) subscribed to "#" to see if that is the case and which topics are involved. If you shut down the broker then that should break the loop. Run the subscribe test first though so you can identify which topic is looping.

@Colin

Agreed on 1 core... but all 4 cores were flat out... Other processes were 2 to 3%.... (Note that restarting node Red cleared the problem, is it likely that NR is just causing the symptom to show and restarting it hid the cause?)

It does have a fan, it was around midnight and ambient was around 12 to 15C, so it was working quite hard...

MQTT loop? as in one topic/device calling or sending to another and returning(on its own)? I think not... Unless I understand you incorrectly... (which is more than likely...LOL)...

Regds
Ed

Sometimes the loop is subtle.
It is possible to subscribe to 'topic/#' then through various nodes & links feed some response out to 'topic/subtopic/subsub' which is still a loop.

What were you seeing that tells you that?

An MQTT loop is where you have something publishing to MQTT, that is picked up by something else subscribed to that topic which then somehow causes that same topic to be published again, etc etc. The most common simple case is where something like a dashboard Switch output is connected to MQTT in order to drive a device, and that same topic (or one derived from it) is fed back into the dashboard Switch in order to set the switch based on the state of the device. If Message Pass Through is set in the switch then a loop is created. Loops can be much more complex however involving multiple inter-related topics.

most likely it is slowly running out of memory so starting to swap - at that point other processes will also need to swap and as swap on a Pi is slow they will all start to burn cpu to shift things round.