Machine times between reboots/lockups reducing

Yes, that is (now) alas the point of suspicion. :frowning:
They (RasPies) don't have any over voltage detection - like with their under voltage one?

That would go a long way to proving that is the problem.

I know it isn't under voltage - or am 99% sure - as I have code that if it happens it logs the event.

I haven't proved it. But I know a few times in early days I was doing things and I got a notification of under voltage and so the power supply was replaced with a bigger one.

That was more an oops moment as they are all pretty similar in looks and I got an early one for a newer RasPi which needed more amps than that supply could give.

Frankly, over-voltage is unlikely to be a problem as long as you didn't leave lots of extra volts connected for hours.

I wouldn't trust that. You need to measure it. Don't forget that a USB cable can loose a lot. A USB tester can be put on the Pi end of the cable so that you know exactly what the Pi is getting.

Sorry. I was only mentioning what I had seen.

I shall have to see if I can get a USB voltage tester then.

If you have another pi on the system that runs for months without a reset then try swapping the PSUs and cables between the two devices and see what happens.

1 Like

(If only it was that easy)

For now there are three and none of them are playing the game.

Alas each one with its own quirks which further complicates things for me.

Are you using official pi power supply units bought from a reputable source?

Have you got any devices connected directly into the pi?

Are any of the others having unexplained reboots that when you look in the log don't show a normal shutdown?

Do you see reboots on more than one device at the same time?

Alas no.

I have them. Just to use them space didn't permit.

One machine has suffered SD card failure twice in a short time.
I am now using a proper power supply to see how long it lives before dieing (spelling?) again.

I bought a good brand power board with USB ports witch can supply enough amps.
But that is to be tested soon.

I got a USB tester, but I have to build it first.
:frowning:
But that isn't beyond me.
I just need to allocate the time to do it.

Is that all in answer to the first question? If so then what about the other three?

That is one of the three.

the other two are still not lasting too long between reboots.
But as it is only a week ago, I can't really say anything on that side of things.

Let me ask the original questions again, with numbers to make it easy

  1. Are you using official pi power supply units bought from a reputable source?
    I think you have answered that one.

  2. Have you got any devices connected directly into the pi?

  3. Are any of the other pis having unexplained reboots that when you look in the log don't show a normal shutdown?

  4. Do you see reboots on more than one device at the same time?

1 - now on one of them, yes. The other two: no. But they have been like that since nearly day 0.

2 - Two of them have USB sticks plugged into them for backups. 1 has a WiFi dongle plugged into it, but it is not really doing anything because of WAP and KERNEL conflicts. That one also has a RTC installed in/on it.

3 - All 3 seem to have 1 week life between reboots. Looking at the logs hasn't happened yet.
Other than the one I showed you and it was an uncommanded reboot.
One of these also has a (nearly) NEO pixel led strip plugged onto it's GPIO pins.
It isn't a NEOPIXEL one, but the other one. (Basically the same though: addressable RGB)

Reboot histories as best I can show:

Machine 1:

pi@TimePi:~/.node-red/public/logs/reboot/2021-05 $ lf
Rebooted at 2021-05-04 140059.db  Rebooted at 2021-05-15 041242.db  Rebooted at 2021-05-26 070713.db
Rebooted at 2021-05-12 073744.db  Rebooted at 2021-05-15 072132.db
pi@TimePi:~/.node-red/public/logs/reboot/2021-05 $ cd ..
pi@TimePi:~/.node-red/public/logs/reboot $ ls
2019  2020  2021-01  2021-02  2021-03  2021-04  2021-05  last_alive.db  Rebooted at 2021-06-02 081319.db
pi@TimePi:~/.node-red/public/logs/reboot $ 

Machine 2:

pi@TelePi:/media/pi/9020-9C27/logs/reboot $ ls
2021-05  arc  last_alive.db
pi@TelePi:/media/pi/9020-9C27/logs/reboot $ ls 2021-05
Rebooted at 2021-05-24 212055.db  Rebooted at 2021-05-28 133010.db
pi@TelePi:/media/pi/9020-9C27/logs/reboot $ 

Machine 3:

pi@BedPi:/media/pi/06BB-C87D/logs/reboot $ ls
2021-05  last_alive.db  OLD
pi@BedPi:/media/pi/06BB-C87D/logs/reboot $ ls 2021-05/
'Rebooted at 2021-05-25 134106.db'  'Rebooted at 2021-05-25 150252.db'  'Rebooted at 2021-05-25 173641.db'
'Rebooted at 2021-05-25 134220.db'  'Rebooted at 2021-05-25 154729.db'  'Rebooted at 2021-05-29 111834.db'
pi@BedPi:/media/pi/06BB-C87D/logs/reboot $ 

Machine 3 has only just got a new SD card and is now using an official power supply to see if that is/was a problem.

Is that a No then to question 4?

@Trying_to_learn you could try https://flows.nodered.org/node/node-red-contrib-vcgencmd
Your pi can tell you if it's currently, or even previously had a low voltage supply.
Check the node readme.

I used this node after problems, and it pointed me to replacing the power supply.

1 Like

Good idea. Or just use vcgencmd from a terminal if just a check is needed.

Hi @Trying_to_learn

Just sticking my uneducated nose in here - I had a slow "memory leak" that I eventually tracked down to ?tcp sockets? being left open .... (sorry, not the best at remembering the right terms)

I ultimately found the problem(with a hell of a lot of help from pretty much everybody who has answered you here) ... One of the "tools" I used was this:

[
    {
        "id": "f16c8c4d.9d7c5",
        "type": "inject",
        "z": "3055fb7f.f62864",
        "name": "Tickler 10sec",
        "props": [
            {
                "p": "payload"
            },
            {
                "p": "topic",
                "vt": "str"
            }
        ],
        "repeat": "10",
        "crontab": "",
        "once": true,
        "onceDelay": "",
        "topic": "",
        "payload": "",
        "payloadType": "date",
        "x": 110,
        "y": 310,
        "wires": [
            [
                "c602dc4f.4d9978",
                "cf97546.7fa3028",
                "d33f08eb.80118",
                "d14e39e5.54206",
                "98bb25d9.a33d38"
            ]
        ]
    },
    {
        "id": "d14e39e5.54206",
        "type": "exec",
        "z": "3055fb7f.f62864",
        "command": "cat /proc/net/sockstat | grep sockets | awk '{print ($3)*1}'",
        "addpay": false,
        "append": "",
        "useSpawn": "",
        "timer": "",
        "name": "TCP Sockets Used",
        "x": 320,
        "y": 250,
        "wires": [
            [
                "11fab008.76c098"
            ],
            [],
            []
        ]
    },
    {
        "id": "11fab008.76c098",
        "type": "string",
        "z": "3055fb7f.f62864",
        "name": "",
        "methods": [
            {
                "name": "toInteger",
                "params": []
            }
        ],
        "prop": "payload",
        "propout": "payload",
        "object": "msg",
        "objectout": "msg",
        "x": 480,
        "y": 250,
        "wires": [
            [
                "e16f3731.ee6b8"
            ]
        ]
    },
    {
        "id": "e16f3731.ee6b8",
        "type": "smooth",
        "z": "3055fb7f.f62864",
        "name": "",
        "property": "payload",
        "action": "mean",
        "count": "10",
        "round": "0",
        "mult": "single",
        "reduce": false,
        "x": 600,
        "y": 280,
        "wires": [
            [
                "a309e776.043b88",
                "6100f9b9.75055",
                "c140aa12.b358b"
            ]
        ]
    },
    {
        "id": "c140aa12.b358b",
        "type": "debug",
        "z": "3055fb7f.f62864",
        "name": "",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "false",
        "statusVal": "",
        "statusType": "auto",
        "x": 770,
        "y": 310,
        "wires": []
    }
]

The output of this can be filed away in influx etc and you can see if the number of sockets used increases over time, ultimately leading to a lockup/reboot situation....

Forgive me if I am sending you on a wild goose chase, not my intention...

Regds
Ed

1 Like

The issue here doesn't appear to be a memory leak as the symptom is a sudden reboot without a normal shutdown. A memory leak would lead to an out of memory notification in the log and a node-red restart. Not a reboot.

Also while not unexpected if a sudden outage occurred when it starts doing this - it's generally not a good sign...

... as you can never be sure what exactly it deleted - maybe something vital... maybe not... maybe a swiss cheese operating system.

2 Likes

I'm also wondering if you have a poor mains supply given that you have multiple Pi's with issues.

It is possible that a spiky mains supply could cause such issues unless your USB PSU's are attached via a decent filtered power block. Not as common in many countries now but there are certainly areas that suffer from poor, spiky supplies. You might even be causing it by having bad wiring or a faulty electrical device around your property somewhere. Do you have other electrical problems?

I think the orphan cleanup is indicative of your SD-Card issues - you need to make sure that you are using cards that support wear levelling and that the card is large enough to allow it to work well. I use 32GB Samsung EVO or EVO-Pro cards and never have had a single issue since I started using them some years ago.

If you have a Pi that you can free up. Get a good new card, put a new, clean implementation of Rasbian and run it for a couple of weeks to make sure it is stable without any other config.

Yes Paul.

That is what I use.

Glad I am not the only one - or: It is good that others are also doing this.

Ed,

Thanks very much.

Though others may scoff, I appreciate help.

I can't say if it is a memory leak or what.

But I think it is good/better to check all things.

I've imported it onto THIS machine - just for now to get a feel for what it does.

I'll stick it onto the other (three) machines soon and let it run it's course.

I may change it to every minute though.

Oh and it is good to hear it did help you to fix your problem.
That's always a good thing.