Can someone help me with this "problem"

Yes, that would be something you could try. Otherwise, as I said, think about where in your flows you might be looping and add some debug to try to narrow it down.

It may be your flows have been affected by the async message delivery changes that were in 1.0.

I rolled it back 1 month.

Same :frowning:
100% CPU load (from another NR machine.)

It is obvious the machine is running 100% because even trying to ssh to it is painful.
I can nearly go and make a cup of coffee in the time it takes to establish a connection.

That error message on the screen shot.... Is that something I should consider as a possible cause?

The only reason I am asking is once before a node was being problematic.
If I had it installed, it would kill the machine.
Uninstalling it, it would work ok.
Alas simply not using it didn't help.

Error is coming from some camera or something in that direction. And which node is producing that output ? (click on the "node: Flow error message")

(Blush) that error message is way gone.

I am hoping it has been logged to a file, but I am suspicious I am not logging errors.

The camera is on CameraPi (another machine)

I was messing around with FTP sending pictures from it to another machine.

ARGH!

Ok, I get it.

I had the alarms on CameraPi set to REMOTE, so when a node throws a wobbly, it sends the error to this machine.

Thanks. At least I don't need to worry about that.

On the idea of "running out of space":

pi@TimePi:/ $ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        15G  4.6G  9.4G  33% /
devtmpfs        213M     0  213M   0% /dev
tmpfs           217M     0  217M   0% /dev/shm
tmpfs           217M   17M  200M   8% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           217M     0  217M   0% /sys/fs/cgroup
/dev/mmcblk0p1   42M   23M   19M  55% /boot
tmpfs            44M     0   44M   0% /run/user/1000
pi@TimePi:/ $ 

I'm not pushing my luck with this usage am I?
33% and 55% usage.

Your original post showed


a quick google for kswapd0 will tell you that it is the process that manages swapping things in and out of memory... and once that get busy you know you are running out of RAM. There are quite a few guides on things you can do to tune that a bit.

1 Like

Thanks @dceejay

Not wanting to sound more stupid than I am (though I don't think that is going to be . . . . )
That kind of stuff still is in the realm of magic and I am not confident to go into it blindly.

Is it too much asking for help on it here?

At the risk of being obvious... try typing " what is kswapd0 " into google.

Ok, I will try.

Going down said path, I found this link which I have visited and applied:

Disable swap files

As I read it, it is not a good idea to have swap files active as they kill SD cards.
Though this link may not say that, I remember it because one day I was exploring the net and read something about swap files killing SD cards, and therefore how to disable them.

This was done WEEKS ago. I can't really associate this as a cause.
I have disabled it on ALL my RPIs and none of them (including the Zeros) seem to have any problems like this.

Also this is a RPI 2 quad(?) core. All the zeros are single.

It is not complicated. Something in your flow is clobbering the processor. Start it in safe mode and disable the code that triggers what you think is causing the problem (the pings or whatever). Deploy and check it is ok. Gradually re-enable bits until it goes wrong, then you will know the problem. It will not be the pings themselves it will be something that follows on from the ping.

Something in your flow is clobbering the processor.

Is it node-red ? What else is running on this pi ? I see vnc, lxpanel (full desktop) ?

Something else from your first screenshot
Screenshot 2019-12-22 at 13.39.53
The system load average is something mentioned in an earlier topic too. Since this is a pi, a load average of 4.00 means your pi is having a 100% cpu load for everything together. Anything beyond that 4.00 will slow all systems down, though at reaching 4.00 you will already notice some of that slowdown. With those 3 values, it tells you that for at least the last 15 minutes the average load has been more than 100%, so something is running on the pi hogging all system resources.

Looking at the load average is often a quick way to see how the system has been behaving for the last bit.

Since this is a pi, a load average of 4.00 means your pi is having a 100% cpu load for everything together.

To elaborate a bit more: 0.00 to 1.00 max per cpu core, pi has 4 cores = max 4.00

It is a busy pi for sure :wink:

1 Like

Oh certainly, which I all explained in an earlier topic :slight_smile:
For reference, start here and keep reading downwards: CPU load giving funny output from machine

Hmmm... Yeah. And because I can't get information from that I am still stuck.

The problem kind of resolved itself. Hard to fix something which isn't happening.

Reading through other stuff I have since been shown:

swappie

I ran the command and got back the default 60.

I upped it to 80 and rebooted.

No visible change. :frowning:

But! (Some good news)

This is what top gives me now: (Cropped)

Not perfect, but better.

(Sorry - I missed that reply.)

Did it/ tried it. Alas no change.

Ok, taking a step back.


Sub-flows.

I have a couple of them in my flows.

If some/one/any of them get a bit confused that won't help the problem.

I have included messages so if a variable isn't set which is needed/used to get things done, it spits out a generic message telling me so.

Obviously I can't have the node identify itself if its name hasn't been set.

I have a bunch of them for each machine I have and now and then (when deploy is pressed) they spit the dummy saying they aren't configured.

Question:
I am doing "modified node" deploy.
At boot there is an inject node which injects the name/s for the node/s.
That survives a "modified node" deploy so long as the sub-flow isn't the node modified.

So why is it that these sub-flows are sending a "node not configured" message when they are? (Semi rhetorical)
And if they send that message, it should be on going every time they are sent a message.
They don't. It is just "now and then" - as said: usually after a deploy.


Timing.
It was set to 20 seconds.
I cranked it to 60 seconds. Leaving enough time for things to settle down from the previous round.

No change to CPU load.

Taking a step back too. Pkilling apache might shut it down for a bit, but not if it runs as a service, then the choice would be to stop the service.
A better question is the why question. Is there a specific reason you need apache to run on this pi? If not, uninstall it or disable the service. If yes, pkilling is not a wise choice either.

Because (probably) a node I am using says it needs it.

FOUND IT!

This is a problem with writing code to detect errors with the catch all node.

There was some feedback from a function node which wasn't getting a flow variable name.
(Which turns out wasn't set - and that in itself is a mystery as the flow value was/is/should be set by an inject node at start up.)

Anyway, following something else, I went to the flow which handles the errors and saw a node count going up at a great rate.

Stopping that has reduced the load back to 20%

New thread coming soon with questions about what I found on in that little journey.