Develope a real hardware watchdog module, anyone interested?

GiovanniG · 20 April 2019 12:38

Hi mates, we all know how uncomfortable are unexpected "hangs" of our system.. sometime even critical, and how hard is to guess what can be wrong there.. as they may happen really seldom, almost impossible to replicate, very difficult to dignostic.
In past I had power troubles with my RaspberryPI3, now I solved them but I had once again an hang, maybe caused from my "not best quality" SD, maybe not, cause it affected the TCP stack.. and how TCP can be connected to SD (while other services like NR, I2C bus, ICMP ping, reep working regulary) I really don't understand.

To avoid any possible problems we need a watchdog, I mean a real watchdog that is able to reboot PI and solve the problem, eventually let us know that it happen wo be warned, I see it the only way to be sure it will work anycase.
There are different ways to obtain that, someone will suggest to interrupt the power supply somehow, maybe with external Arduino, others may suggest to use internet services that allert if the "ping" to PI will stop. I'm not intersted on these solutions, I would like to use the internal hardware watchdog of PI that should be activated turning on its service, and managed properly.

Hardware watchdog is present in every microcontroller/automation system and it can trig an hardware reboot, it should be present in every automation system, for this reason I'm suggesting NR developers to consider this too, in my opinion it's important.

Features needed:
PI should always be reachabel by SSH, so we need to check the port 22 if responsive on loopback.
Web NR management port 1080 is good to check too, as the NR is responsive
A module on NR where we can set the ports to be checked and accept incoming messages to check NR is working properly is welcome, if no message is incoming after certain time the watchdog is triggered.

ABout the watchdog services there is docmentation around, but hard for me to understand how to work with it properly, Domoticz did somenthng to manage it, maybe it chcks already ports too, it can be worth to have a look on it..
I asked their help but received none ((
https://www.domoticz.com/wiki/Setting_up_the_raspberry_pi_watchdog

TotallyInformation · 20 April 2019 13:35

The problem is that cutting the power on something like a Raspberry Pi can be rather fatal to the SD-Card.

If you really want to do it, you can easily use a SONOFF (ESP8266 based power switch).

On my live system, if Node-RED loses connectivity to the network, it triggers a controlled reboot which is all I've ever needed. My Pi's are on PC style UPS's to ensure that they are always on or that they can gracefully close down if the power has been out for too long.

GiovanniG · 20 April 2019 13:40

Good point: it would be great to study this Raspbian watchdog service, and see if it is already including a shutdown/unmount scripts before trig the hrdware reboot.

Bad point: as I said I would exclude use any external device which can interrupt power supply. I would kindly pray contributors to don't talk about it, thank you

Colin · 20 April 2019 14:51

If it is not going to interrupt the power then what do you want it to do?

GiovanniG · 20 April 2019 15:03

IN Raspberry there is Broadcom BCM2835 watchdog timer, in order to enable the watchdog timer, In many posts I saw that add in the /boot/config.txt a line with:

dtparam=watchdog=on

My goal is to use this timer to reset hardware the Rasberry, without any external device, etc.
Answering your post I searched and found
https://www.raspberrypi.org/forums/viewtopic.php?t=147501
I hope I'll have soon time to read it

Colin · 20 April 2019 15:15

That looks interesting and potentially very useful. Hopefully you will be able to get something working and will post back here.

GiovanniG · 20 April 2019 15:20

I'm really glad you consider this interesting! I've found some more infos and I asked for better clearance:
If is needed to type this on SSH:
modprobe bcm2835_wdt
echo "bcm2835_wdt" | sudo tee -a /etc/modules
apt-get install watchdog
update-rc.d watchdog defaults

edit config for this:
nano /etc/watchdog.conf
uncomment #watchdog-device
uncomment #max-load-1
and add: watchdog-timeout = 15 (for example)

Now I'm figuring put what is resetting every 15 seconds the counter...

In another therad someone else wrote:
In /boot/config.txt add/change:

watchdog=on

In /etc/systemd/systemd.conf, change #RuntimeWatchdogSec= to:

RuntimeWatchdogSec=10s

so what config file should be changed?

Colin · 20 April 2019 15:30

Don't know. In a quick bit of googling I found a number of posts suggesting apparently different things. Probably there is more than one way to skin a cat.

GiovanniG · 21 April 2019 21:24

someone suggested me https://mmonit.com/monit/ which is a service able to keep monitor the system, restart services, send email, etc. It's not clear if it may also manage the hardware watchdog, I asked.
It looks interesting, I need to test it

GiovanniG · 22 April 2019 14:27

I read the post about Raspberry bcm2835 watchdog, it looks that they configured it in a simple way, it means the daemon itself reloads the timer every 10 seconds, if not (this daemon is stopped or kerne is in panic) the RB reboots.
Now here we can have the case that kernel is working and daemon too, but TCP is hanged and RB unreachable, how to reboot it in this case?
We can make a function n NR that if a TCP is not reply, for different times we can rise the "bomb" to the daemon (a command string), but what if NR stop workng too? Who will start the bomb?

In the Daemon there are some possible controls:
May 20 09:33:02 raspi-server watchdog[707]: file: no file to check
May 20 09:33:02 raspi-server watchdog[707]: pidfile: no server process to check
May 20 09:33:02 raspi-server watchdog[707]: interface: no interface to check
May 20 09:33:02 raspi-server watchdog[707]: temperature: no sensors to check

But since now I haven't find how to use them, these can be interesting, for example check a file from NR, if it is not updated (hanged NR or we intentionaly from NR not udating it) the daemon will restart.
Need to figure out how

GiovanniG · 22 April 2019 14:55

Reading more I figured out a simpler way to manage this Watchdog, it need s to be tested:
https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=147501&sid=d0e60e399b3af454586a96b0bb5963da&start=25#p1251435
I report it here:

you need not to inititalize/ load any drivers, dtoverlays etc.

not any kind of special configuration is required

don't install the watchdog package

don't fiddle with systemd watchdog settings

nonetheless the device "/dev/watchdog" is simply there

you can write "." to the device "/dev/watchdog" at any time to start the functionality

this triggers the hardware watchdog which expires in 15s per default (I don't know how to change this interval)

i.e. the machine reboots unless you write "." again to the device "/dev/watchdog" prior to the 15s expiry period

to stop the watchdog you simply write "V" to the device "/dev/watchdog"

most of my applications are running some sort of dispatch loop. So the only thing I have to worry about is
to rewrite the "." just in time.

trivial shell script to explain the watchdog behavior

Code: Select all
while :
do
    date
    echo . > /dev/watchdog
    sleep 14
done
if the loop fails to write the '.' just in time (i.e. prior to the 15s expiry period) the machine reboots instantly

So as he wrote we may just see if the device is present, if so we can add in NR a function to write to the device a "." to activate the process, and always from that time. If so it will be really comfortable, we activate it only when NR is active, and we don't have to worry how long will be the boot process. We can try a soft reboot (sudo reboot) before the watchdog will reset writing a "." before sudo reboot to gain more 15 seconds. We can also try dismount devices if we decide to hard reset.
Need to be tested,
how can I write from NR to /dev/watchdog a "."?

zenofmud · 22 April 2019 15:03

maybe try the exec node... name echo as the program and pass in as parameters the . > /dev/watchdog

GiovanniG · 22 April 2019 15:12

Thank you, it would be interesting to write automatically a "V" to /dev/watchdog also when we stop NR, to avoid reboots when we are working/debugging/etc. How to?

Colin · 22 April 2019 15:16

Write a script to stop node red, consisting of

node-red-stop
echo "V"  > /dev/watchdog

I think it would be better to put quotes round your dot too, just for clarity.

dceejay · 22 April 2019 15:59

and you may want to give slightly more than 1 second grace period before rebooting... if node.js starts it's garbage collection routine it can pause other execution, and if other tasks are occurring on the Pi you could accidentally reboot when you don't actually need to. But yes - this is an interesting topic. Will be great to see where you get to with it.

GiovanniG · 22 April 2019 18:17

good point, is there a way to know if node.js starts it's garbage collection routine? Or something where it would be better to don't reboor the system?
My goal is to test something, for example the TCP connections, if all is ok I'll keep kick the Watchdog, maybe every second, if the control fails I'll send an email/Telegram message but keep kicking, if fails for some consecutive seconds I'll send the last kick+sudo reboot, now if here something is getting wrong on reboot without kicking the watchdog will reset system, if NR for any reson not being teminated it will not kick more too.

GiovanniG · 23 April 2019 11:23

Tested, it works perfectly, as he described. Writing a . will activate and kick, writing a V will stop it resetting. But it works only on root!
With Exec module can I write on /dev/watchdog as root?
It would be great to create a module for this feature, Receiving a msg.topic "On" it will activate and kick the watchdog, Receiving a msg.topic "Off" it will deactivate sending an echo V. It's also important to send a V everytime NR send modules a shutdown, even after deploy.. to avoid undesidered reboots.
Can soeone kindly help? Thank you!

TotallyInformation · 23 April 2019 12:11

You would have to allow the user running Node-RED to use SUDO without a password for that specific command (safest). Better still would be to change the permissions on /dev/watchdog to allow it to be written to by the user running Node-RED.

GiovanniG · 23 April 2019 12:21

Thak you for kind help! Changing the permission to Watchdog sounds really good, I've did it with chmod 666 /dev/watchdog under root, will it now keep this setting forever?
Thank you

TotallyInformation · 23 April 2019 12:28

Yes, it will keep the settings unless you reinstall your OS. However, you might want to lock that down a little more since you've now gone to the opposite extreme and are allowing anyone with access to your server to mess with the watchdog.

I would make sure that the ownership of the device is something like root:admins or root|wheel and then set chmod to 660 (rw-rw----). Then make sure that the user running Node-RED is a member of the chosen group.

Topic		Replies	Views
Node Red watchdog, assure long time working General	20	3788	23 April 2019
Node-Red not reachable after irregular time intervals General	30	1088	17 March 2022
Remote reboot device from node red General	6	769	29 September 2021
`ping` node story. Weird happenings General	20	759	7 April 2021
Node red random restart General	49	3356	16 June 2020

Develope a real hardware watchdog module, anyone interested?

Related topics