How does a node send entries to Syslog on a Pi?

Rgrove · 23 November 2022 16:37

I have a pi that disappears from the network every day, and when that happens, Syslog is overwhelmed with entries from SNMP nodes that can't connect with their target.

The nodes run once per second, and sometimes the Pi goes 10 hours before I get a chance to reboot it...so that's (presumably) 36,000 failed SNMP requests in my syslog...per node!

The Pi doesn't have the SNMP service installed directly, so there's no config for that.

Nodered doesn't appear to use Syslog.

How does this thing log?

Colin · 23 November 2022 17:11

Can you show us a sample message please?
Are you seeing them on the pi or on a machine monitoring the pi?

Rgrove · 23 November 2022 17:13

It's on the Pi itself, when doing a CAT /var/log/syslog

Nov 23 11:56:02 xxx-011-Wxxx Node-RED[329]: 23 Nov 11:56:02 - [error] [snmp:Get                                    SNMP] RequestTimedOutError: Request timed out

Rgrove · 23 November 2022 17:19

These are kinda odd ones that came in while it was on the network, but they're basically the same as the flooded log.

Nov 23 12:11:23 xxx-011-Wxxx Node-RED[329]: 23 Nov 12:11:23 - [error] [snmp:Get SNMP] RequestTimedOutError: Request timed out
Nov 23 12:11:23 xxx-011-Wxxx Node-RED[329]: 23 Nov 12:11:23 - [error] [snmp:Get SNMP] RequestTimedOutError: Request timed out
Nov 23 12:11:23 xxx-011-Wxxx Node-RED[329]: 23 Nov 12:11:23 - [error] [snmp:Get SNMP] RequestTimedOutError: Request timed out
Nov 23 11:51:25 xxx-011-Wxxx Node-RED[328]: 23 Nov 11:51:20 - [error] [snmp:Get SNMP] Error: send ENETUNREACH 10.85.28.243:161
Nov 23 11:51:25 xxx-011-Wxxx Node-RED[328]: 23 Nov 11:51:21 - [error] [snmp:Gates data] Error: send ENETUNREACH 10.85.28.238:161

Colin · 23 November 2022 17:20

That is coming from node-red, I guess it is coming from the snmp node, though I haven't used that node. I use Telegraf for snmp.

Not a direct answer to the question, but have you considered adding a small flow to trigger a reboot if the pi loses connection to the router for more than a short time?

Rgrove · 23 November 2022 17:32

Yes, but it's kind of tricky. I currently have an inject node set to fire at 6am every day which runs an exec node with a reboot command (set up this morning).
The way the network is set up, I can't really trust a ping test without getting a ton of false positives, resulting in a lot of unnecessary reboots.

Rgrove · 23 November 2022 17:41

I'm actually also not 100% certain the pi is still up when it stops responding to pings from the remote site. I'm hoping it is and network is just not working for some reason.

I should note, I had an identical system set up that was doing the same thing. This Pi is new hardware and a fresh install of all software. So I'm thinking something in my flows or other software is crashing it, or the network switch is going admin-down until the Pi is power cycled and renegotiates at the hardware layer.

The Pi, a radio transmitter, and a PDU are deep in the woods at a fire tower site.
The network is set up with an IP radio acting as a lan extender, so the Pi and other gear all appear to be on the same lan as the home site or the shop.
The link is very tenuous, so if there's a large data copy happening (IE a video motion detection event), the pings over the radio will fail.
I really only know the Pi is down by repeated failed pings, and it's nodered failure to check in within 7 minutes to a master nodered system.

Colin · 23 November 2022 18:28

You can tell by looking at syslog.

Use a trigger node so that it only reboots after 10 minutes of bad pings, or whatever time is appropriate.

Rgrove · 28 November 2022 16:16

I can't tell by looking at syslog because the log is unmanageable/unreadable with all the junk entries...

But I confirmed now that the Pi does actually cease to function when it appears to drop off the network.

I would still like to figure out where the syslog entries come from for the SNMP node so I can shut them off, and start chasing down the daily lock up.

Colin · 28 November 2022 16:34

You can use tools such as grep to search for relevant sections. Also if you know that it was not responding at a particular time you just have to look for that time in the log and see if there is anything there. If there isn't then it was dead, and you can look for the last entries before it stopped to see what happened.

How did you do that?

They come from the snmp node.

Rgrove · 28 November 2022 16:50

You can use tools such as grep to search for relevant sections.

I follow, but I don't know what to search for. I don't really know what's happening when the SNMP entries pile up, so it may be running for a while and the SNMP junk crashes it, or it may be crashing from something else and the SNMP junk is just normal malfunctions that are piling up in the log.
I will try to open the log and look by time, but the logfile keeps growing too large to open. It took 15 minutes for cat to scroll through it when I first started chasing this.
I'm running over a 1.544 mb link, so my bandwidth to reach the device is really constraining.

How did you do that? [confirmed Pi faulted]

I should rephrase that to...I strongly believe the Pi ceased to function...
I have MotionEye running on it also, and all pics or video stop around the same time it stops answering pings.

They come from the snmp node.

I know...but where is that configured? I can't find a config file or path from NodeRed.

Ultimately, I'm going to end up swapping out the Pi for a fresh build because I have to do some physical re-mounting and cable routing, and intended to add an external SSD to preserve the SD card. But if I can get it to stop sending SNMP node entries to Syslog, I may be able to fix it from home and not need to swap it out at all.

Colin · 28 November 2022 17:13

You can see all node-red entries using
cat /var/log/syslog|grep -i node-red

Copy the relevant syslog file to your PC then you can open it in a text editor to examine at your leisure without having to wait for the data to be transferred across.

I suggest you start a new thread with a title that indicates that you want to know how to reduce the logging generated by node-red-node-snmp (assuming that is the node you are using). Then the right person will hopefully respond.

Are you sure that is necessary? What does this show
ls -l /var/log/syslog*
Also
df

Rgrove · 28 November 2022 19:04

You can see all node-red entries using
cat /var/log/syslog|grep -i node-red

Thanks... I don't want to see them, they're what's overloading the logfile.
But that got me in the right direction. I added the -v option, and sent output to a file, so I was able to open that new filtered file in Nano, making it much easier to navigate. Still digging, as it's still very large, though.

Copy the relevant syslog file to your PC then you can open it in a text editor to examine at your leisure without having to wait for the data to be transferred across.

Not enough bandwidth for large file copies, I'm mostly stuck down at telemetry data over the link, and that's it...even when I Cat the syslog over Putty from home, hardware at the far end stops responding to pings because of the bandwidth loading.
Though I will possibly try to copy the new filtered file (don't want to unload the editor and see how large it is yet, took too long to scroll down and don't want to reset that).

I suggest you start a new thread with a title that indicates that you want to know how to reduce the logging generated by node-red-node-snmp (assuming that is the node you are using). Then the right person will hopefully respond.

Any reason not to edit this one if that's possible? Perhaps my original title was too foundational to the objective. A new thread would effectively be a duplicate, and the question has somewhat evolved/refined through this thread.

Are you sure that is necessary? What does this show
ls -l /var/log/syslog*
Also
df

Is there a reason NOT to do it? SD card seem to be the lost likely point of failure in the system.
It's a question of how quickly I can recover from a fault/failure. In my dashcam, I cycle large video files continuously on an SD card and it's no big deal because I can swap it out quick if it fails.
This Pi is in a very inaccessible place several hours drive away, so I can't swap it out for a few days if the SD card fails. Then I've lost telemetry and security video until I can get on site. This is actually the second Pi to be used there; first had an SD card failure, and I set up a new one at home to swap out on site, then did postmortem when I got home.

I still want to reduce the output from NodeRed to syslog, but the grep direction got me moving now.

Colin · 28 November 2022 20:47

You said you had a 1.5Mbps link (at least that is what I understood) which should give you 150kBytes/sec which is 9MBytes/min, so it wouldn't take that long for a few tens of megabyte file.

You can edit the title. Click the pen icon next to it.

Well it is unlikely to be the SD card that is making it drop off the network. If the processor is suddenly stopping then that is much more likely to be something like the PSU. That is why you need to look at the log to see what is there at the time it stops. Do you have any hardware connected to the Pi? Pickup on wiring is another possibility.

SD cards are much more reliable now than they used to be, provided you pay for a good quality one of a good size (32MB for example) and buy it from a reputable supplier. I have six Pis running 24 continuously and it is several years since I lost an SD card. Just to humour me, tell me what the commands I asked for show.

How often are you polling SNMP? I use Telegraf to drive SNMP and output direct to Influxdb, so node red is not involved. I poll every 15 seconds and get similar messages to yourself when devices are offline. Currently I am getting four errors every 15 seconds, but syslog is still only around 10MB/day which is nothing for a modern SD card.

Rgrove · 29 November 2022 00:20

If it were the only thing happening on the link, that bandwidth calc might work, but there's other data flowing. And, the bandwidth fluctuates as a really long wireless link, so it may not really be reliably 1.5mbps.

Here's the syslog listing:

-rw-r----- 1 root adm  43608033 Nov 28 19:06 /var/log/syslog
-rw-r----- 1 root adm  74287524 Nov 27 00:00 /var/log/syslog.1
-rw-r----- 1 root adm         0 Oct 27 16:01 /var/log/syslog.1.gz-2022102800.backup
-rw-r----- 1 root adm   2190064 Nov 20 00:00 /var/log/syslog.2.gz
-rw-r----- 1 root adm    961031 Nov 13 05:09 /var/log/syslog.3.gz
-rw-r----- 1 root adm    985223 Nov  6 19:08 /var/log/syslog.4.gz
-rw-r----- 1 root root 87776820 Nov 23 11:47 /var/log/syslog.old20221123

And the disk space check:

Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/root        7312680 5612324   1350320  81% /
devtmpfs          340908       0    340908   0% /dev
tmpfs             472492      16    472476   1% /dev/shm
tmpfs             189000     760    188240   1% /run
tmpfs               5120       4      5116   1% /run/lock
/dev/mmcblk0p1    258095   49240    208856  20% /boot
tmpfs              94496      20     94476   1% /run/user/1000

External connections are just a USB camera, but this seemed to be fine for quite a while before the daily hangups.

Regarding SD card failures...I've had plenty and I don't see any reason to oppose prepping for that. Some in Pi's, some in dashcam's, and so on.
I've had them fail at remote sites, and the headaches of recovering are certainly more painful than a little extra prep. There's basically no significant cost in time/material/power to make the card read-only and use a small outboard SSD for repetitive writes.

Rgrove · 29 November 2022 00:21

Also, forgot, 3 SNMP nodes perform Get once per second.

Colin · 29 November 2022 09:14

I see you have only an 8GB card, and it is nearly full, that is not good for card life as the card balances up the wear on blocks as they get used and freed up. It can only do that using the available unused blocks.. It would be better with a 32 GB card.

OK, that is why you have such a lot of errors in the log. Do you really need it to be monitored at that rate?

Are SSDs more reliable than spinning discs?

Rgrove · 29 November 2022 17:55

I see you have only an 8GB card, and it is nearly full, that is not good for card life as the card balances up the wear on blocks as they get used and freed up. It can only do that using the available unused blocks.. It would be better with a 32 GB card.

Or...with an external SSD?

OK, that is why you have such a lot of errors in the log. Do you really need it to be monitored at that rate?

Yes, you can see earlier in the thread that I am aware of this. The crux of the thread is that NodeRed/SNMP is clogging up the log and I would like to stop it from sending errors to Syslog...
The traditional metering was analog, so dropping to a 1 second sampling rate is very slow, relative. Imagine you're watching an audio meter, then only look once per second. Granted, moving to SNMP requires compromise.

Are SSDs more reliable than spinning discs?

Generally, in the conditions I will have it, yes.

Colin · 29 November 2022 20:56

Perhaps SNMP is not the best tool for what you are doing.

Rgrove · 29 November 2022 23:46

Perhaps SNMP is not the best tool for what you are doing.

Perhaps, but manufacturers seem to disagree, so this is what I'm working with.

Seems silly that we have a lengthy thread of factors of my use-case, rather than a relevant solution.
Perhaps someone else has an idea where I can dig into this?
Thanks for your help thus far, Colin.

Topic		Replies	Views
Node-Red not reachable after irregular time intervals General	30	1087	17 March 2022
Making Sense of Syslog Messages General	22	1682	24 September 2020
`ping` node story. Weird happenings General	20	759	7 April 2021
Advice needed on Pi CPU usage General	154	4461	27 April 2021
Exec node now throwing errors at me General	11	914	8 April 2021

How does a node send entries to Syslog on a Pi?

Related topics