Weird things happening. CPU load spikes

There seems to be something happening but I can't quite find it.

Machine details:
NR 1.2.6
O/S Stretch.
RPZ (W)

Usually it runs about 30-40% load.

But now and then it maxs out to .... 90-100%. Sustained. :frowning:

I've tried disabling tabs (flows) but that doesn't seem to help.

So I ssh'ed in and I am seeing this from htop

Now, yes, I am running/loading the GUI.

But there seems to be multiple sessions(?) (instances)..... and quite a few multiple instances of things.

Is that right?

From an explanation provided when using another app and asking same question, I was told the multiple instances are normal... multiple threads I think.

Unless it actually locks up, I don't think is is much to worry about. It may very well be that the OS is just catching up with display, background tasks or memory management??

Looking at your image, the one visible point of CPU usage, at 81.5%, is the lxpanel, and that would probably be only it's own usage, thus something(s) else using the remainder.

You can sort the view by clicking on the CPU% header so that it shows usages from highest to lowest... but it may bounce around a lot when looking from that point of view.

Thanks.

It seems a bit sporadic in its happenings so it may just be (as you kind of said) the O/S doing its stuff.

Yeah, clicking on the % doesn't really show any one thing really eating the CPU load.
Which is sort of annoying as it doesn't help me understand what is going on.

I guess I may just have to live with it.

It just was there in my face when I was doing things recently on the machine.

Why do you even use the GUI? You can achieve everything you need with SSH & a remote browser

3 Likes

Because I am ..... lazy and have a USB stick plugged in to which I save log files.
(saves wear and tare on the SD card)
And unless the GUI is loaded, the USB stick isn't seen. I don't know how to use fstab.

And.....

Now and then I log in via a VNC - the one that now comes on/in the O/S.
Handy to do things that way rather than via CLI / terminal.

I am a hybrid between CLI and GUI.

Sorry folks.

Bit more info/

It's happened again.

This is what I see in one instance of htop

Clearly node-red is the Problem here.

Sorry, I missed the screen grab I wanted.

Now after I have done this, the CPU load has dropped back to 30%-ish.

Seemingly taunting me with the problem.

I saw the node-red load drop and the highest load was .... say 8%.
A few lines of low load, but not enough to get the 90% load, yet the top bar stayed at 90%,

This normal?
ie: The list (sorted on CPU load) was showing maybe 50% load but the bar graph was still showing 90% for a few cycles.

I think so... again, as long as it isn't locking up, I don't worry too much about bursts of full CPU use. After all, mine is only a RPi3b

As for reading too much into numbers... This is what I get when restarting flows... 200% :astonished: Now that is some StarTrek like warp speeds. Way to go Scotty!

Otherwise, this is a normal coasting rate, with the big CPU hog being my Node-Red controlled OLED display example, that I have running some form of (non-burn) display 24/7

1 Like

(Sorry Gunner)

Weird....

What happened and why I revoked the solution:

The fact that most of the time the CPU load is down around 20-30% and it spikes to 80% (or thereabout) and STAYS there for days!

I don't know if I am stubborn or stupid, but I stopped Node-Red and restarted it.

Back to 20-30% load.

The load then spiked again and after disabling a tab it dropped.

HHmm... Interesting.

Enabled the tab. No rise.

More digging.

It seems to be happening at 18:54 (local time) and I know something that happens around that time.

BUT

Putting speed nodes (which show how fast the messages are going through a node) they are only showing 1 or 2 a second here and there.

I've looked for loops in the flows - I don't think this machine has enough grunt to run lint.

But, now it seems - more investigating needed - I shall have to watch what happens this afternoon at (around) 18:45 onwards.

If it is one of your node-red application thread is using 80% of the CPU for days then this is not normal.
You might have implemented a loop that keeps node-red forever busy OR one specific node of your flow is using those CPU resources.

And I fully agree.

The sticking point for me is finding said problem.

The flow (tab) isn't that bad. Yes, it is messy, but not .... bad.

What I did was add a bit of extra stuff on a different tab that gets the CPU load.
If it is >70 and is that way for more than 3 minutes it generates a message.
That is queued for me to see in a debug node.

So daily I would look at the load. If it was pegging out I would check the queue.
Two days in the time is 18:50 (or there about).

The next day I watch it: nothing! HHmm..

Next day: same.

Now, it has seemed to have stopped.
Because I added these few nodes that check the load and save the time to allow me to narrow down the problem?

I hope not. :wink:

This flow is running a python script and there may be a few iterations of it running.
But it is low priority and simply monitors an MQTT channel and sends messages to a different flavour of Neopixel leds.

But: Suddenly it has stopped happening daily and the CPU load is usually at the expeted 20-30% load.

Yes, when I re-deploy it spikes. But this doesn't cause a message to be queued.

Weird.

Observations:

The load is at one of its peaks.

I disable a particular tab, deploy and the load goes away.

Shame it isn't that simple. But it is at least pointing me to that tab/flow.

As I said, the machine may not like/support lint.

The load spikes (seemingly) when an event happens.

Good.

So I have nice proved this by forcing the input of that part of the flow and it spikes - and remains high.

So, I am getting somewhere in/on finding the problem.

This is the crux of part of the flow which CAUSES the spike, but you may see I have a speed node connected and it shows me a spike as the message is being sent then goes back to 0.
So that isn't THE problem.
It causes it, but isn't THE problem.

[{"id":"54c563c0.e41374","type":"function","z":"a1911aa3.c45be8","g":"90e7d5cd.1db07","name":"Sunset indicator","func":"//  node.send()\n\nvar delay_time;\nvar sunset;\nif (msg.delay_time == undefined)\n    msg.delay_time = 4000;\ndelay_time = msg.delay_time;\nif (msg.sunset == undefined)\n    msg.sunset = \"196,10,0\";\nsunset = msg.sunset;\n\nmsg.payload = \"rgb,0,\"+sunset;\nnode.send(msg);\n\n//  Delay\nsetTimeout(f,delay_time);\n\nmsg.payload = \"rgb,1,\"+sunset;\nnode.send(msg);\n\n//  Delay\nsetTimeout(f,delay_time);\n\nmsg.payload = \"rgb,2,\"+sunset;\nnode.send(msg);\nif (msg.existing[0] == undefined)\n{\n    msg.existing[0] = \"0,0,0\";\n}\nmsg.payload = \"rgb,0,\"+msg.existing[0];\nnode.send(msg);\n\n//  Delay\nsetTimeout(f,delay_time);\n\nmsg.payload = \"rgb,3,\"+sunset;\nnode.send(msg);\nif (msg.existing[1] == undefined)\n{\n    msg.existing[1] = \"0,0,0\";\n}\nmsg.payload = \"rgb,1,\"+msg.existing[1];\nnode.send(msg);\n\n//  Delay\nsetTimeout(f,delay_time);\n\nmsg.payload = \"rgb,4,\"+sunset;\nnode.send(msg);\nif (msg.existing[2] == undefined)\n{\n    msg.existing[2] = \"0,0,0\";\n}\nmsg.payload = \"rgb,2,\"+msg.existing[2];\nnode.send(msg);\n\n//  Delay\nsetTimeout(f,delay_time);\n\nmsg.payload = \"rgb,5,\"+sunset;\nnode.send(msg);\nif (msg.existing[3] == undefined)\n{\n    msg.existing[3] = \"0,0,0\";\n}\nmsg.payload = \"rgb,3,\"+msg.existing[3];\nnode.send(msg);\n\n//  Delay\nsetTimeout(f,delay_time);\n\nmsg.payload = \"rgb,6,\"+sunset;\nnode.send(msg);\nif (msg.existing[4] == undefined)\n{\n    msg.existing[4] = \"0,0,0\";\n}\nmsg.payload = \"rgb,4,\"+msg.existing[4];\nnode.send(msg);\n\n//  Delay\nsetTimeout(f,delay_time);\n\nmsg.payload = \"rgb,7,\"+sunset;\nnode.send(msg);\nif (msg.existing[5] == undefined)\n{\n    msg.existing[5] = \"0,0,0\";\n}\nmsg.payload = \"rgb,5,\"+msg.existing[5];\nnode.send(msg);\n\n//  Delay\nsetTimeout(f,delay_time);\n\nif (msg.existing[6] == undefined)\n{\n    msg.existing[6] = \"0,0,0\";\n}\nmsg.payload = \"rgb,6,\"+msg.existing[6];\nnode.send(msg);\n\n//  Delay\nsetTimeout(f,delay_time);\n\nif (msg.existing[7] == undefined)\n{\n    msg.existing[7] = \"0,0,0\";\n}\nmsg.payload = \"rgb,7,\"+msg.existing[7];\nnode.send(msg);\n\nreturn;\n\nfunction f(delay_time)\n{\n    //\n    setTimeout(f,delay_time);\n}\n\n\n","outputs":1,"noerr":0,"initialize":"","finalize":"","x":3860,"y":780,"wires":[["49eeec0f.0f1104","25ecfab5.e55b6e","526f2144.4434c"]]},{"id":"526f2144.4434c","type":"msg-speed","z":"a1911aa3.c45be8","g":"90e7d5cd.1db07","name":"","frequency":"sec","estimation":false,"ignore":false,"x":4050,"y":820,"wires":[["dc5e8c4b.49ec08"],[]]},{"id":"dc5e8c4b.49ec08","type":"debug","z":"a1911aa3.c45be8","g":"90e7d5cd.1db07","name":"Speed","active":false,"tosidebar":true,"console":false,"tostatus":true,"complete":"true","targetType":"full","statusVal":"payload","statusType":"auto","x":4030,"y":860,"wires":[]}]

Now, the messages are sent to a MQTT node and they are handled by a python script to control some RGB leds.

Luckily I can control said script so I stopped the script - gotta try things - and it stopped.
No drop in spike.
Restarted script. Same.

But if I DISABLE the script and deploy it, the load goes back down.

I'm working on it but want an external set of eyes on it in case I'm missing an elephant.

(If you are still reading)

I think I have found THE cause of the problem.

A function node I wrote.

Code:

//  node.send()

var delay_time;
var sunset;
if (msg.delay_time == undefined)
    msg.delay_time = 4000;
delay_time = msg.delay_time;
if (msg.sunset == undefined)
    msg.sunset = "196,10,0";
sunset = msg.sunset;

msg.payload = "rgb,0,"+sunset;
node.send(msg);

//  Delay
setTimeout(f,delay_time);

msg.payload = "rgb,1,"+sunset;
node.send(msg);

//  Delay
setTimeout(f,delay_time);

msg.payload = "rgb,2,"+sunset;
node.send(msg);
if (msg.existing[0] == undefined)
{
    msg.existing[0] = "0,0,0";
}
msg.payload = "rgb,0,"+msg.existing[0];
node.send(msg);

//  Delay
setTimeout(f,delay_time);

msg.payload = "rgb,3,"+sunset;
node.send(msg);
if (msg.existing[1] == undefined)
{
    msg.existing[1] = "0,0,0";
}
msg.payload = "rgb,1,"+msg.existing[1];
node.send(msg);

//  Delay
setTimeout(f,delay_time);

msg.payload = "rgb,4,"+sunset;
node.send(msg);
if (msg.existing[2] == undefined)
{
    msg.existing[2] = "0,0,0";
}
msg.payload = "rgb,2,"+msg.existing[2];
node.send(msg);

//  Delay
setTimeout(f,delay_time);

msg.payload = "rgb,5,"+sunset;
node.send(msg);
if (msg.existing[3] == undefined)
{
    msg.existing[3] = "0,0,0";
}
msg.payload = "rgb,3,"+msg.existing[3];
node.send(msg);

//  Delay
setTimeout(f,delay_time);

msg.payload = "rgb,6,"+sunset;
node.send(msg);
if (msg.existing[4] == undefined)
{
    msg.existing[4] = "0,0,0";
}
msg.payload = "rgb,4,"+msg.existing[4];
node.send(msg);

//  Delay
setTimeout(f,delay_time);

msg.payload = "rgb,7,"+sunset;
node.send(msg);
if (msg.existing[5] == undefined)
{
    msg.existing[5] = "0,0,0";
}
msg.payload = "rgb,5,"+msg.existing[5];
node.send(msg);

//  Delay
setTimeout(f,delay_time);

if (msg.existing[6] == undefined)
{
    msg.existing[6] = "0,0,0";
}
msg.payload = "rgb,6,"+msg.existing[6];
node.send(msg);

//  Delay
setTimeout(f,delay_time);

if (msg.existing[7] == undefined)
{
    msg.existing[7] = "0,0,0";
}
msg.payload = "rgb,7,"+msg.existing[7];
node.send(msg);

return;

function f(delay_time)
{
    //
    setTimeout(f,delay_time);
}

Walk through of what happened for me to say this so confidently:
That tab was a bit populated with stuff. It is/was hard to isolate what was what.

This is the latest thing to be applied and so - all things being equal - is most likely to be the problem.

I copied a part of the flow to a whole new TAB and linked a node to get the flow.context from the original flow.
(Silly me I didn't originally but.....)

So: I had the suspect node/s on their own TAB.
Threw in a message to get things going. They happened.
But the CPU load went through the roof.

I disabled this new TAB and..... voila! CPU load went back to 30%.

So this node is THE problem.

What it does:

It gets a signal.
This message needs (or should have) a few things.
I hope the code written gets around them not being set. (if you don't mind, could you confirm this is true?)
It sends several messages out.
They are basically of this structure:

LED,RED,GREEN,BLUE values.

(Ok, I just noticed a problem around line 10... Moving on)

It sends out a message for LED #0 to be of a value. (sunset)
delay
Then for LED #1.
delay
Then for LED #2. It then restores LED #0 to its original value.
delay
(repeat for LED #3 and restoring LED #1)

and so on.

But it is getting it's knickers in a knot and I am not seeing it.
(Though typically after I post it I may see it.)

Someone - please.

This is not how you delay code inside a function.

You are causing a kinda recursion loop.

Go look at examples of setTimeout on the net.

Essentially, you would pass the thing to happen as a callback to the setTimeout so that it gets executed upon the timeout.

1 Like

A better solution if you are trying to space out the messages is to simply let them fly out of the function then use a delay node set to rate limit mode.

I am sure I have used this (or something very similar and it has mutated to this code) before and it doesn't blow the CPU load.

I may just do what you said and take the easy way out this time:
Spit out all the messages and use a delay node to rate limit them.

Oh, not a big deal, but I can't control the rate limit with the msg that way.
Not a problem though. I'll just pick a nominal value.

For the sake of my learning.....

I sat down and tried to study how to do this thing I messed up.

This is the new code for the function node I wrote which seems to work.
(No CPU load spike when it is run)

I just want to check there aren't any other Gremlins in there.

It isn't exactly how the examples are shown, but there is a lot of work to parse variables, so I made the function a delay only and not send messages.

It is pretty much as it was in the first post, but the delay function has been re-written better - I hope.

//  node.send()

var sunset;
var z = 0;
var delay_time;
if (msg.delay_time == undefined)
    msg.delay_time = 4000;
delay_time = msg.delay_time;
if (msg.sunset == undefined)
    msg.sunset = "196,10,0";
sunset = msg.sunset;

msg.payload = "rgb,0,"+sunset;
node.send(msg);

//  Delay
setTimeout(f,delay_time);

msg.payload = "rgb,1,"+sunset;
node.send(msg);

//  Delay
setTimeout(f,delay_time);

msg.payload = "rgb,2,"+sunset;
node.send(msg);
if (msg.existing[0] == undefined)
{
    msg.existing[0] = "0,0,0";
}
msg.payload = "rgb,0,"+msg.existing[0];
node.send(msg);

//  Delay
setTimeout(f,delay_time);

msg.payload = "rgb,3,"+sunset;
node.send(msg);
if (msg.existing[1] == undefined)
{
    msg.existing[1] = "0,0,0";
}
msg.payload = "rgb,1,"+msg.existing[1];
node.send(msg);

//  Delay
setTimeout(f,delay_time);

msg.payload = "rgb,4,"+sunset;
node.send(msg);
if (msg.existing[2] == undefined)
{
    msg.existing[2] = "0,0,0";
}
msg.payload = "rgb,2,"+msg.existing[2];
node.send(msg);

//  Delay
setTimeout(f,delay_time);

msg.payload = "rgb,5,"+sunset;
node.send(msg);
if (msg.existing[3] == undefined)
{
    msg.existing[3] = "0,0,0";
}
msg.payload = "rgb,3,"+msg.existing[3];
node.send(msg);

//  Delay
setTimeout(f,delay_time);

msg.payload = "rgb,6,"+sunset;
node.send(msg);
if (msg.existing[4] == undefined)
{
    msg.existing[4] = "0,0,0";
}
msg.payload = "rgb,4,"+msg.existing[4];
node.send(msg);

//  Delay
setTimeout(f,delay_time);

msg.payload = "rgb,7,"+sunset;
node.send(msg);
if (msg.existing[5] == undefined)
{
    msg.existing[5] = "0,0,0";
}
msg.payload = "rgb,5,"+msg.existing[5];
node.send(msg);

//  Delay
setTimeout(f,delay_time);

if (msg.existing[6] == undefined)
{
    msg.existing[6] = "0,0,0";
}
msg.payload = "rgb,6,"+msg.existing[6];
node.send(msg);

//  Delay
setTimeout(f,delay_time);

if (msg.existing[7] == undefined)
{
    msg.existing[7] = "0,0,0";
}
msg.payload = "rgb,7,"+msg.existing[7];
node.send(msg);

return;

function f(delay_time,z)
{
    //
    z += 1;
    if (z < delay_time){
        setTimeout(f, delay_time,z);
    }
}

f(delay_time,z);

Anyone - just to check.

Your function code modifications have stopped the infinite calls to f(delay) but it still doesn't do what you think.

All messages are sent immediately (you are not delaying them one bit) then the setTimeouts happen but ultimately do nothing. This is easy to see by putting node.warn("sending message") before every node.send(msg) AND putting node.warn("inside f(delay)") inside the setTimeout

As I said before, you need to pass the function code to execute as a callback to the setTimeout

Dont you have this option...
image

Steve, I get that is a nice way and I did it.

Last night (or when ever) I just wanted to sit down and work it out and try to learn.

JS is a whole new language to me. As I said: the examples I read on the net aren't directly translatable to NR workings - for me.

I have another working (I hope) one and I took it apart.

I shall have to check it some more.

I can't get my head around how to do nested calls so the node.send is done within the timeout loop.

TOO HARD. I accept that.
It may be LAZY asking for help from others. But I can't get my head around how to do it.

Oh, on the "Don't you have this option?"....

NR 1.2.6 so I'm guessing: NO.

It isn't visible when I set the delay node to rate limit.