I do have pool of 16 devices running Node-RED and connected trough MQTT.
There are 15 "slave", each of them connected to a machine and wired to IO & sensors to monitor them.
All those data are being sent to the "master" that display all that on a dashboard. (Each slave also have its own dashboard for dive-in analysis or monitoring if needed)
I do have a basic function to check the status of the machine, to know if it is running or stopped.
I also would like to indicate when there is a connection issue or if the machine is powered off.
The MQTT broker is on the master device.
I do have data coming every few seconds for each slave, so I was considering to have a function to check the time of the last data received for each machine, every 1 minute, and if nothing is received thus I know the machine is either off or facing a connection issue.
I am open to suggestion for better,cleaner way to check the status of those slaves.
I have seen the keep alive feature, the LWT ... but I am really unsure how to apply those in my case since I am interested by the status of the clients, not the broker.
That is exactly what LWT is for. When a client connects to a broker anyone subscribing to the clients LWT topic will get the Online message, and if the client disconnects the subscribers will get the Offline message.
Note though that the standard LWT system only applies for "unexpected" offline events, such as a power failure. If node-red is stopped by stopping the service then offline does not get sent. The node-red mqtt nodes add an extra message though that allows a message to be sent even for events such as stopping the service.
Look in the Messages tab of the MQTT config node to find them.
@cymplecy in the broker config node is a tab where you can define a "birth message". It will be published whenever the connection to the broker is established. Not much more to say about it.
I think it was the "Message sent before disconnecting (close message)" and "Message sent on an unexpected disconnection (will message)" that @cymplecy was referring to.
I don't know if there is any specific documentation for the close message. The other messages are standard MQTT.
I make great use of it to, for example, show "Offline" where a temperature should be if the remote device is not connected.
All my MQTT connected devices send heartbeat messages on a channel that has an LWT. The heartbeat says "Online", the LWT says "Offline". The topic is DEVICES/<deviceId>
I tend to do this, with the same topic for all three as I don't care why it is offline. It may well be more efficient to use booleans rather than strings that have to be compared to test them but I generally think that having unambigous text is a good idea.
To have a deeper check that a service or application is running and functioning on the slaves as expected, it is not enough to rely on some mqtt status messages as discussed above. They are of course good to have but it could very well happen that the mqtt connection is good while the logic in your service/application has stopped working
You mention that you have a basic function checking the statuses of your machines, I think this is a must so it is good you have it
In my case I have added logic to all my self-written services/applications running in slaves. The "master" sends "heartbeat" requests from NR via mqtt that is received and processed, finally sending a respons back via mqtt. In this way I know that the communication AND logic is still working as foreseen
Also for services/applications eventually running in the same machine as the "master" I use the same methods using NR and mqtt
I guess I had understood the LWT wrongly and kind of the opposite, thus I did not look more into it.
So thanks @Colin for the explanation.
I think i will do as @TotallyInformation & @krambriw, do a combination of a heartbeat signal sent to the slaves something like every 1 minute, with integration of birth/close/will messages to pick-up anytime something happen.