Can someone help me with this "problem"

(Yeah, ask in the RPI forum. Most times I post there I get sarcastic - if not insulting - replies.)

I am not getting what is happening.

And while writing this thread/article the machine has hung.

This is what I see with the top command.

Yet when I ask NR, it says CPU load at 100% and just basically not moving.

It used to be pulsing from about 30% to 100% every 20 seconds on a time span to do things.
I get that. When it has a few things to do the load goes to 100%.

But now just stuck at 100% all the time?

Yet top isn't really showing that.
And as I just said, the machine has just hung when I got that screen shot.

I am hoping someone here may spot a problem or be able to help me understand what is going on.

I hope it isn't too much to hope.
(as you can see from what I am typing I am stressed by the problem.)

I'm wondering if it might be a combination of circumstances

You might be running out of RAM memory and the Pi is swapping out to your SD Card

And that might be full

Are you able to run Node-RED without running apache and see what happens then?

How do I not run apache?

Or stop it?

to stop it - type

sudo pkill -f apache2

in a terminal

What are "a few things" ? This is where the problems originate.

Basically ping about 16 IP addresses on my local LAN.

That loads the machine. Otherwise it is ok.

This suggests something in your flows is stuck in a loop.

You can check that theory by:

  1. Stop the Node-RED service : node-red-stop
  2. Use top to verify NR has stopped and CPU usage is back at a normal level.
  3. Start NR in safe mode : node-red --safe
  4. Check top again.

If the cpu usage is still at the expected level, then the issue is with your flows.

Look for anywhere that might be looping in some form and add a debug node that also logs to the console (so it will appear in the output of node-red-logs).

Just tested it, 40 addresses, 2% cpu (and possibly purely coincidental)

It will depend on how you ping. Count 1 should be enough (-c1) - if answer then ok.

Ok, the events:

NR stopped:

pi@TimePi:~/.node-red $ node-red-stop

Stop Node-RED
 
Use   node-red-start   to start Node-RED again
 
pi@TimePi:~/.node-red $ 

NR stopped.

Checking with top again after a while.

This popped up as an error which may need investigating:

����dExif

(Drats, can't copy from debug output.)
See screen shot.

I only modified small parts of flows after the update to NR 1.0.3 and latest dashboard.
But nothing which should really make a difference.

Thought:
If I have an OLD flow, can I back up this one and use an older one which didn't have the problem and see what happens?

Yes, that would be something you could try. Otherwise, as I said, think about where in your flows you might be looping and add some debug to try to narrow it down.

It may be your flows have been affected by the async message delivery changes that were in 1.0.

I rolled it back 1 month.

Same :frowning:
100% CPU load (from another NR machine.)

It is obvious the machine is running 100% because even trying to ssh to it is painful.
I can nearly go and make a cup of coffee in the time it takes to establish a connection.

That error message on the screen shot.... Is that something I should consider as a possible cause?

The only reason I am asking is once before a node was being problematic.
If I had it installed, it would kill the machine.
Uninstalling it, it would work ok.
Alas simply not using it didn't help.

Error is coming from some camera or something in that direction. And which node is producing that output ? (click on the "node: Flow error message")

(Blush) that error message is way gone.

I am hoping it has been logged to a file, but I am suspicious I am not logging errors.

The camera is on CameraPi (another machine)

I was messing around with FTP sending pictures from it to another machine.

ARGH!

Ok, I get it.

I had the alarms on CameraPi set to REMOTE, so when a node throws a wobbly, it sends the error to this machine.

Thanks. At least I don't need to worry about that.

On the idea of "running out of space":

pi@TimePi:/ $ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        15G  4.6G  9.4G  33% /
devtmpfs        213M     0  213M   0% /dev
tmpfs           217M     0  217M   0% /dev/shm
tmpfs           217M   17M  200M   8% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           217M     0  217M   0% /sys/fs/cgroup
/dev/mmcblk0p1   42M   23M   19M  55% /boot
tmpfs            44M     0   44M   0% /run/user/1000
pi@TimePi:/ $ 

I'm not pushing my luck with this usage am I?
33% and 55% usage.

Your original post showed


a quick google for kswapd0 will tell you that it is the process that manages swapping things in and out of memory... and once that get busy you know you are running out of RAM. There are quite a few guides on things you can do to tune that a bit.

1 Like

Thanks @dceejay

Not wanting to sound more stupid than I am (though I don't think that is going to be . . . . )
That kind of stuff still is in the realm of magic and I am not confident to go into it blindly.

Is it too much asking for help on it here?

At the risk of being obvious... try typing " what is kswapd0 " into google.

Ok, I will try.

Going down said path, I found this link which I have visited and applied:

Disable swap files

As I read it, it is not a good idea to have swap files active as they kill SD cards.
Though this link may not say that, I remember it because one day I was exploring the net and read something about swap files killing SD cards, and therefore how to disable them.

This was done WEEKS ago. I can't really associate this as a cause.
I have disabled it on ALL my RPIs and none of them (including the Zeros) seem to have any problems like this.

Also this is a RPI 2 quad(?) core. All the zeros are single.

It is not complicated. Something in your flow is clobbering the processor. Start it in safe mode and disable the code that triggers what you think is causing the problem (the pings or whatever). Deploy and check it is ok. Gradually re-enable bits until it goes wrong, then you will know the problem. It will not be the pings themselves it will be something that follows on from the ping.