Develope a real hardware watchdog module, anyone interested?

thanks for help, in my case nobody can access this RB anyway I instralled Raspbian and didn't change nothing, really nothing on user/etc. How you suppose is my situation now? As you described? To type commands sudo I can without password con console (without root). Thank you

You can leave what you've done if you like, it is your setup after all :slight_smile: I was just pointing out the potential issue. Not a big deal if you are on a secure network and not allowing external access.

I just know how these things tend to get forgotten about until it is too late.

The best approach is the one I've mentioned. Just add the Pi user to the wheel group (it may already be there) and do sudo chown root:wheel /dev/watchdog then sudo chmod 660 /dev/watchdog changes are immediate.

Do you mean that you can use sudo without having to enter a password? If so then this because, unbelievably, some raspbian installations default to not needing a password for sudo for any command. This can be fixed using sudo visudo.

Hi mate, about your suggestion, if I create a script I suppose it can be executed only if my actual patch is the same where the file is placed. How can I create a script that works like "node-red-stop" in all folders?
There is possibile too to edit the script file node-red-stop and add the line to stop the watchdog timer? If so, when I will update NR, this edit of the file node-red-stop can be lost somehow? Thank you

Also:
there is any way to intercept a "shut down" of the node red? this can be caused by a node red stop or a siple deploy. I would like to send watchdog a V exactly when this happens, I need a module which produce me output when shutdown is triggered and by a function I'll send an exec that will send a V to watchdog device.
Will it always be successful before node red can prematurely terminated and the exec will not have place?
Thank you

Rather than add the line to node-red-stop make a file containing

node-red-stop
echo "V"  > /dev/watchdog

and put that somewhere in the path, such as /usr/local/bin. I think that is usually in the path. Then you can call it from anywhere.
I don't know whether there is a way of hooking into the node-red shutdown/deploy.

thanks, I can't write in that path even if I have root.

About shutdown/deploy, there should be a way, my LCD always says "system shutdown" and then "system up" when deploy or node red start/stop. How?

Does the directory exist? If it does then you must be able to write to it using sudo.

I don't know about the other question.

I used this tutorial to install the watchdog timer a few years ago.

Pi running Dietpi OS with random lockup once every 3 to 4 months.
I fork bomb-ed it and now it works fine. Reboots after a lockup.

The next issue: Pi lockup randomly at 118+ degrees in a warehouse attic / mezzanine in the July heat.If the kernel panics it may partially lock up the unit. Not enough to cause the watchdog to reset but enough to crash ssh and influxdb.

Solution was to add kernel.panic = 20 line to the /etc/sysctl.conf file.

Also added the line to the above file kernel.panic_on_oops=15
Sometimes a panic will still allow an unstable run.

Meeki

1 Like

Thank you a lot dear friend! I don't understand this line: "Sometimes a panic will still allow an unstable run" you mena that even adding line kernel.panic = 20 and kernel.panic_on_oops=15 you still have unstable runs?

You mey need to test SSH like I would like to do.. but this watchdog can just make a ping or check interface, and they both working. It looks the only solution is to check SSH by NR and delete a file in case it fails, so the Watchdog can reboot the RB. I don't see other ways.
Tomorrow I'll test it and I?ll use in conf file:
max-load-15 =24
realtime = yes

And in future I would like to introduce the file = line

Bad news here.. play with Watchdog I was guessing it's dangerous.. but I haven't imagined that much. Here is the story:
Yesterday I suffered from another "hang", same symptoms of the other time, no new SSH sessions, no WEB, telnet, etc. Ping was possible, Node red was working but only with I2C pheripherals..
A miracle happened, really, that the laptop I placed close raspberry with an SSH session opened, haven't that session terminated so I was able to see what was going on. All the commands on prompt became not "valid" (because the volder /usr/bin was not available), only echo worked but unfortunately the dev/watchdog kept again the old permission values, only root was working and su/sudo didn't work more, so I couldn't reset the raspberry remotely (this is a domotic system of a client, his home, I can't go there always and it's much better I solve the problems promptly and remotely). I checked the logs and I saw the SD corruption, ok, at least I know it's that. Some observations here, when SD got corrupted:

  • all unix commands "gone", nothing is in RAM exept echo, if the SSH wasn't root I can't restart RB anyhow (and echo to watchdog, an echo to kernel, that's a big disadvantage, it would be nice to find out how to eventually.
  • No more SSH, telnet, etc I suppose because they are connected to the SD avaliability, seems nothing new can be found and opene "just in RAM", even if a SSH session was on and SSH service was running in ram it looks Raspbian want to access some files, if not it gives up. Also this is not that great.
  • SD was unusable, it happened again around after one week, even if during the week I was sudo reboot the sytem every about 48 hours, it doesn't matter.. cause even if I reboot it is always on and not rebooting like system, I think it hangs because of some internal protection/lock (maybe temperature), it's curious this happens about once a week regulary, the SD start to work back regulary only after a power off cycle, it doesn't need to cool down. Maybe so, a reboot was useless too to quit from that hang, even hard reset from watchdog. I don't have idea if a hard reset may interrupt the power to SD to reset it, or if there is a combination of signals that can create an internat effective reset on SD, it's very difficult to simulate all this again. In my opinion chances are negative.

In the morning I bought a new SD, a Sandisk Ultra 32GB (Before it was a Samsung 16GB class10, probably an entry level), I cloned it, all ok.
In the afternoon (so for some hours I assured SD work correctly) I installed watchdog following instructions:
sudo modprobe bcm2835_wdt echo "bcm2835_wdt" | sudo tee -a /etc/modules
sudo apt-get install watchdog sudo update-rc.d watchdog defaults
$ sudo nano /etc/watchdog.conf
it doesn't matter what I put on the config, the daemon worked always correctly, I simulated and I had about 60 seconds to correct the problem, I saw the enties on syslog (but not in the path I specified) after it Raspberry rebooted.

The big problem became from modprobe bcm2835_wdt I suppose, "randomly" RB rebooted itself without reasons, every time about after 20 minutes, even if i put again a # on watchdog-device = /dev/watchdog (and then sudo reboot to be sure it got it), the daemon suppose now only to log, it can't activate the watchdog.
I became worried that in this conditions 1) I can't live the client like that 2) I can have soon or less loose of data cause I've no idea if it unmount the SD before resetting.
I'm using log2ram so the syslog was lost every time and I can't check it, I couldn't find a way googling to deactivate the watchdog.. it's always like this, guys tell you how to activate something but they forgot the commands to restore the previous situation.
I've found only "put on /boot/config.txt the line dtparam=watchdog=off, I did and reboot, the SSH started to be not more fluent I noticed, and after some minutes RB hanged completely, no ping, no NR, no I2C bus, nothing more responding, oi oi.. I power off and on, I deleted the line in config and reboot again, nothing changed, the RB installation was comprmised, maybe the solution was easy but what to do? Wait someone kindly suggest me it?
So luckly I had the morning fresh image of the old SD, I restored it and be really watching out to reinstall again this watchdog before having clarifications.
Using the echo commands to kick watchdog is much safier and clear that all this confusion, and you can simply recover out from that, simply not kick the watchdog or write a V..

You seem to keep cloning or copying existing installations and I personally would avoid that. Especially if you have an odd problem that might be hardware related or could be a corrupt OS file.

I would start with a completely fresh download of Rasbian and test the watchdog on that. Get that working reliably and then you can add other software.

However, I think I may have mentioned previously that I don't think that this is your answer at all. You should not be needing to do anything this complex and I would suggest that something else altogether is wrong.

Thank you for your kind answer, well yes I can, but installing raspbian again will take time, there are several things to set and install, not really welcome for me cause I've to stay there as short as possible.
Anyway, it may ba a mistake, a watchdog reboot, and the system files can be again compromised.
Anyway I don't think I suffered from that problem, the SD stopped completely responding (also superblock), not a system componet go halted or corrupted system, not a bad block only, and after power off it worked back and never seen on syslog from the beggining on every boot any error that may suggest me a file system problem, for example somewhere else. I've the old sd, I'll check it if there is any incongruence.

I've noticed I left the change = option uncommented, and they are exactly 23 minutes, the time I experience reboot, ok but why it keed reboot even if I comment out the #watchdog-device = /dev/watchdog and reboot? Very very strange..
and when I added in config.txt the line dtparam=watchdog=off the disaster come out. hanging up after about 5 minutes.
I should see more clear before try again, at first I would know how to remove the modprobe bcm2835_wdt command, which I think is the most dangerous