How are people monitoring by exception?

Guys,

Wondering what people are using for automated management by exception of your systems ? I have at the moment 14 different Tasmota/Sonoff devices scattered around and doing various tasks - some of them only do something once a day (or once a night etc) - it would be nice to have a system to monitor if their tasks have completed (and/or) is the device alive and report on any failures/deviations.

Are people using Offboard tools for this (MRTG, Nagios, etc etc) or specialised flows ?

Craig

NR running a "sentinel" flow is what I use, and surely easier than introducing another tool.

My "main rpi" is responsible for monitoring & checking all other distributed units (rpi's, esp's etc). All those distributed units provide various services to my main home automation. Some services are expected to provide data at a regular interval, others on demand or just executing some stuff/commands at certain dates & times or situations

To be sure they are all alive and healthy, I communicate with them using mqtt. Either by sending out commands and expecting valid answers back or just awaiting expected data at regular interval. Some of my services may be a NR flow, a Python script, an application etc etc

All services will also try to "heal themselves" by restarting if something goes wrong and if no success, finally also reboot the device itself, hopefully to resolve the issue. So far this has worked as I expected

To monitor all this, my "main rpi" has a flow as below. Each and every service out there has a dedicated trigger node that is "energized" regularly with responses coming from each service. If responses would stop, the trigger node will fire an event message that is sent out via Telegram

An overall system status is captured using the "status node". I use this in my GUI's, providing a visual indicator of the overall system status

To monitor that the "main rpi" Node-RED flow itself is working, it simply sends out a heartbeat that is monitored & reported by one of the other distributed rpi's. Closing the loop so to say

6 Likes

Nice approach - thanks for taking the time to answer

Craig

Some time ago I posted a flow to create a network map (does not adhere to your requirements, but it is an idea), if it is only sonoff you are interested in, you could use the LWT for device availability.

https://flows.nodered.org/flow/4b940430b92142571c670050f4d98f6d

1 Like

thats a nice starting place !! Interesting that no one yet seems to have gone for an external solution - will wait and see what else comes up as people wake up/come home

Craig

MQTT already has Last Will and Testament (LWT) baked into the protocol exactly for this purpose. The sensor lodges a message with the broker, that the broker then sends if the sensor goes offline unexpectedly.

I use:

For all my remote (wireless) devices using all kind of techniques to observe the status including MQTT-LWT. Should be expandable for tasmota too.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.