I have a similar issue with both of my Pi Zero W. Now and then, they drop out of my WiFi for about 70-72 minutes (every time). Very sporadically, sometimes after days/weeks without any issue. Nothing unusual in the log files.
As a crude and quick solution I was thinking about a small Node-RED flow that restarts the Wifi service after n failed pings.
The curious thing is the duration of ~72 minutes, but I haven't had the time to investigate any deeper, because the devices always reconnect after that period.
LEDs crazy blinking reminds me of the kernel panics... might be worth looking into what happens at that moment, but if you can’t access the machine at all when it happens that gets harder.
Just as a bit more, this is the block of text I found:
What you want is the modern way of configuring network interfaces, which is event-driven: when an interface is brought up, an event is fired and dhcpcd configures networking (see /etc/dhcpcd.conf). When an interface disappears, an event is fired and dhcpcd deconfigures networking.
In a sense, it is all "hotplug" by default now.
Forget the old commands "ifup wlan0" / "ifdown wlan0".
Instead get in the habit of using "ifconfig wlan0 up" / "ifconfig wlan0 down". Or, to be entirely up-to-date, "ip link set wlan0 up" / "ip link set wlan0 down". These commands will make dhcpcd do something.
True. But can't sudo dmesg see back a couple of boots?
I reboot it, let it settle down, SSH to it and do the command.
I don't see anything worrying or indicative of a kernel panic.
(Though I have had fun with kernel panics not being recorded on another machine. LUCKILY the message had some junk in it which then led to finding the kernel panic. This one doesn't have anything like that.)
On the other machine when the dmesg was done the dates were dating back days and it was only at the end when we saw the recent events. (and the corruption of the file) which led to working out it was a kernel panic.
But that's on another machine.
Ok. So if it happens, I unplug it, move it to a good place plug in monitor/keyboard and see what dmesg says then?
But that will show that boot, not the previous session which hung.
I'm hoping that it is just upset and if I reset the wlan0 with the command it will be happy.
Alas it is a catch 22.
I can't really prove it and will have to wait for it to happen again.
Thing is: How do I detect it?
The command you said.... Yeah. Nice, but when the interface is down, what does it given?
I'm guessing I am going to have to parse it looking for something specific which indicates it is down.