MQTT, messages from offline devices

I use MQTT to talk to remote machines and they also have Birth and Death certificates that have their names in them so I know who is dead and how is alive.

Of course these are only sent at the start/end of life and I get that.

I'm not sure why I included this but it from (like) 2016....

On a machine it is looking for these messages.

If it receives a Birth certificate, it then sends out a Who's here? message.

All machines are listening to this and if they get this message, respond with their details.
Maybe over the top, but it is what it is.

Every now and then something goes awry and I get a BirthCertificate coming in. (WiFi drop outs? - anyway)

I am getting messages from machines that are not even powered on saying "I am dead".
(Sorry I may be wrong, but the machine is OFF and I am seeing messages with its name in them)

I get the Retain part of MQTT, but not how that is applying to what is happening here.

Someone please?

When a client (Node-RED in this case) connects to the broker and subscribes to a topic the broker will send the retained message for that topic to the client.

If you disconnect and reconnect you'll get that same retained message again when you subscribe to the matching topic (or wildcard topic that includes the topic). That's just how MQTT works.

If you want to remove these retained messages for now "dead" machines you need to publish a message to the same topic with the retained bit set and a NULL payload (that is a zero byte payload).

The mosquitto_pub command has a cmd line argument to do this for you.

Ok. Thanks.

So just to be sure....

1 - there could be a problem that the message is retained. I am sure I have gone through and checked with MQTT EXPLORER and I can't see any messages in that topic.
2 - When a Death certificate is published, even future connections to the broker will get that message - way in the future.

I'm not trying to be a smarty pants. I see to be missing something and after reading what you said then, I want to be sure.

I don't think so, if it is not Retained. I may be wrong though, perhaps this is an exception.
However, I cannot think of any reason that you would not want LWT messages to be retained. I would have thought it vital if you restart node-red to get told immediately whether the devices of interest are online or not.

Yeah, but if the machine is "Dead" how can it broadcast that?

The death certificate is sent if there is a communications time out, or the machine transmits "Hey, I'm about to shut down!".

That's fair enough.

But it isn't that simple.

We've covered the birth and death certificates.

I'll accept that.

As stated, if the broker (as it is on that machine anyway) received a BIRTH certificate, it transmits (different topic) a message asking all connected devices to reply.

This is part of the log I have:

2022-02-25 01:51:25 192.168.0.155 >GPS<
2022-02-25 03:37:50 192.168.0.153 >Alarm_Clock<
2022-02-25 10:03:22 192.168.0.93 >TelePi<
2022-02-25 13:47:25 192.168.0.153 >Alarm_Clock<
----< 2022/02/26 >----

Fantastic. But >Alarm_Clock< isn't connected.
So how can it respond to the "Who's online?" question?

This isn't BS/LWT stuff. These are messages transmitted upon specific request from the broker.

This is the routine (kind of) in question:

(Foreign node, but not needed, in there. Just by pass it)

[{"id":"a2f5ce51.4a3ac","type":"mqtt in","z":"9b7e7466.a4b698","g":"eff2cce4.08f878","name":"IFF","topic":"IFF","qos":"2","datatype":"auto","broker":"1bbfcdd.2d24532","nl":false,"rap":false,"inputs":0,"x":4100,"y":2240,"wires":[["19092b5a.4d260d","fbdd582e.8e13b"]]},{"id":"fbdd582e.8e13b","type":"gate","z":"9b7e7466.a4b698","g":"eff2cce4.08f878","name":"","controlTopic":"CONTROL","defaultState":"open","openCmd":"GO","closeCmd":"STOP","toggleCmd":"toggle","defaultCmd":"default","persist":false,"x":4240,"y":2240,"wires":[["52487feb.6471b"]]},{"id":"52487feb.6471b","type":"switch","z":"9b7e7466.a4b698","g":"eff2cce4.08f878","name":"","property":"payload","propertyType":"msg","rules":[{"t":"eq","v":"X","vt":"str"}],"checkall":"true","repair":false,"outputs":1,"x":4370,"y":2240,"wires":[["a6f3614b.c96338","d8365ae1.9c3818"]]},{"id":"a6f3614b.c96338","type":"function","z":"9b7e7466.a4b698","g":"eff2cce4.08f878","name":"","func":"var IP = global.get(\"MY_IP\");\nvar name = global.get(\"myDeviceName\");\n\nvar a = '\"WIFI_DEVICE\":\"' + name;\nvar b = '\"IP_Address\":\"' + IP;\n\nmsg = {\"topic\":\"STATUS/WIFIDEVICEID\",\"payload\":\"{\" + a + '\",' + b + '\"}'};\n\nmsg.topic = \"STATUS/WIFIDEVICEID\";\n\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":4500,"y":2240,"wires":[["cf7b95ba.9dc48","d5e03328.bb55d"]]},{"id":"d5e03328.bb55d","type":"mqtt out","z":"9b7e7466.a4b698","g":"eff2cce4.08f878","name":"Device ID","topic":"STATUS/WIFIDEVICEID","qos":"","retain":"false","respTopic":"","contentType":"","userProps":"","correl":"","expiry":"","broker":"1bbfcdd.2d24532","x":4850,"y":2240,"wires":[]},{"id":"1bbfcdd.2d24532","type":"mqtt-broker","name":"TIMEPI MQTT","broker":"192.168.0.99","port":"1883","clientid":"","autoConnect":true,"usetls":false,"compatmode":false,"protocolVersion":"4","keepalive":"60","cleansession":true,"birthTopic":"SOM","birthQos":"2","birthRetain":"false","birthPayload":"TelePi Comms UP","birthMsg":{},"closeTopic":"EOM","closeQos":"0","closeRetain":"false","closePayload":"TelePi shutting DOWN","closeMsg":{},"willTopic":"EOM","willQos":"0","willRetain":"false","willPayload":"TelePi Comms FAILURE","willMsg":{},"sessionExpiry":""}]

Basically it is this:

If you receive a message from the broker on a non-retained topic then a device must just have published to that topic. For a retained topic then that will be sent any time a client subscribes to it.

Yeah, ok.

I get that.

Alas the broker spat the dummy big time this morning so I can't really show you.

If it happens again, I will do a better job (I hope) of capturing things.

The topic has not got that message in it - obviously it doesn't now as the machine has just been rebooted/power reset.

I am 99% sure I've been down this path and checked. It is blank, but I guess I should wait and share next time it happens.

I just don't get it.
Something is going on I am either not seeing or understanding.

Perhaps you have messed up some copy/paste somewhere and another device is responding pretending to be that one.

Could, but there are more replies than there are devices.

And I posted the code that replies to the question.
(node-red level that is)

So how could it happen?
I'm open to ideas.

Sorry if I am not sharing enough, but it is difficult to know what is and not ... applicable in this case.

The only 3 people who should respond are:
TimePi
TelePi
BedPi

They are the three (the triumvirate) who monitor things.

AlarmClock was a visitor and is disconnected and powered down.

Ok, GPS also replies. That is an Arduino. He hasn't got the brains to double reply.
And I shall have to dig into why I only got 1 of the 3 in the reply.

Are all the devices listening on the same topic, "IFF" ? If so then I would have expected them all to reply, or is that what you mean by the last sentence? If so then something is not working as you expect so yes, investigate that.

Thanks.

Yeah, I will have to power on the AlarmClock and check what's going on with it's messages.

They could be set to retain by mistake - by me.
I was trying thing with the retain flag a while back and there may be some legacy that I forgot to untick that tag.

Maybe responses to your IFF message should include the device's timestamp as well as the device name.

I think I would have the response topic as "status/iff/" + IP to make it clear which device sent a particular response.

So the function node would be like this

var IP = global.get("MY_IP");
var name = global.get("myDeviceName");
var now = new Date();

msg.payload = {'Wifi_Device': name, 'TIMESTAMP': now};
msg.topic = "STATUS/IFF/" + IP ;

return msg;
1 Like

Good idea with the time stamp. But the message is timestamped when received.

So I am not sure that is going to really help.

(Alas just now I am in another hole)
And after looking a bit harder at the log extract, 2 of the 3 machine that should have responded didn't.
(But that could also be that I have blocked them with the gate node while searching for where this message is coming)

I thought the problem was that you receive a message from a device which can't possibly have just now sent it.
A timestamp in the message would give an indication of when it was sent by the device, not when it was distributed by the broker.

Oh! Ok.. Sorry. Yeah. I didn't see it from that perspective.

Sorry sorry sorry.

Shall apply it and wait to see what happens.

ARGH!

These quotes/ticks/thingies...

How do I do it?

var a = '"WIFI_DEVICE":"' + name;
var b = '"IP_Address":"' + IP;
var c = msg.time;

msg = {"topic":"STATUS/WIFIDEVICEID","payload":"{" + a + '",' + b + '"}'};

Yeah, it is wrong now (or could be). I started but then got immediatly lost with the " things.

(and I know I've been told to get the other editor. Its in the pipeline.)

Is this it:
msg = {"topic":"STATUS/WIFIDEVICEID","payload":"{" + a + '",' + b + '",' + c + '"}'};

I think it's this

msg.payload = {'Wifi_Device': name, 'TIMESTAMP': now};
msg.topic = "STATUS/IFF/" + IP ;

Or to be a bit more consistent with my quotes, this:

var IP = global.get('MY_IP');
var name = global.get('myDeviceName');
var now = new Date();

msg.payload = {'Wifi_Device': name, 'TIMESTAMP': now};
msg.topic = 'STATUS/IFF/' + IP ;

return msg;

(Double quotes should be OK too but wrapping double inside single, or vice versa, is bound to cause confusion.)

1 Like

I stuck one of my timestamp subflows upstream that gives me that as msg.time.

But thanks.

Just I have been caught a lot of times where the logs are in Z and though I should know better I keep getting bitten by it.

Seems I have slipped a couple of rungs DOWN the learning ladder today.

A fundamental point of MQTT is that it is the broker sends the will messages - not the client.

The broker decides if a machine has disconnected and if it thinks it has - its sends a last will and testament message (if configured to do so)

1 Like

Ok.

Thanks.

I'm not sure I am fully getting that.

(New day)

Got up and turned on Alarm_Clock.
Looked at the message that is/was somehow retained.
The retained button/flag wasn't set to anything. That COULD be problematic.
Set to to false.

Shall have to now wait and see.

The BS and LWT ARE set to retain - false.
But they aren't the problem at this point in time.

The rogue (new favourite word just now for some reason) message was a handshake response more than something like the BS / LWT.

Anyway small steps. I'll see what happens now.