MQTT message not making sense

Sorry folks, but this is confusing.

Some of the machines have local MQTT hosts set to bridging mode to talk to the main one.

I unplug a machine's CAT-5 cable and this is what I see happen. Which doesn't make sense.

I unplug BEEFPI.

The first two message are locally generated at this part of the flow.

The latter two are the MQTT message and they are for BEDPI, not BEEFPI.

So, let's look at BeefPi's MQTT settings.

All the messages are saying BeefPi......

Fair enough. So for the sake of it, let's look at BedPi's `MQTT node.

So you can clearly see that this is the machine from where the message came.
But believe me: I didn't unplug it. (can't reach and it is in another room)

So: what is going on when I unplug BeefPi's CAT-5 cable and why do I get/see BedPi's EOM message?

I think you may have to supply more info for this one. Rather confusing.

Remind us, why are you using multiple brokers again?

Sorry.

That is sent once NR is running.

It is the CPU load of that machine.

So in the scheme of things it goes like this:
(oh at this stage it is conceptual. The way it works now is a lot more complicated possibly because when I did it, I didn't know better.)

At the start the machine is marked as Off line.

Oh, and there are three MQTT message.
Communications established. (SOM)
Communications unexpectedly gone. (EOM1)
Communications to be shut down. (EOM2)

Machine powers up.
MQTT sends its SOM message. (I'm here)
The machine is marked at Booting.
This starts a timer/trigger. If no message is received in that time the machine is flagged as HUNG.

A while later NR loads and the flow starts to send the CPU load.

If NR gets up and sends the CPU load, the machine is marked as Fully running.

If I tell the machine to shutdown, after all that happens the EOM message is received and the machine is again marked as Off line.

If while being Fully running the link is broken (eventually) the EOM message is received and the machine is marked as Off line.
If NR locks up the CPU load will stop being received but no EOM message.
Between each CPU load message a timer is used to wait for the next.
Being it arrives, the timer is reset and the machine is marked as Fully running.

Clear?

I'm confused how BedPi message is received when I unplug BeefPi.

Is it consistent? If so then the solution is the same as always. Add debug nodes in the right places at both ends and you will find out what it going on. When you have a problem at a high level in the system that is always the first thing to do, find out exactly which node is not doing what you expect.

It seems to be.

BedPi is on. (24/7) It has become one of the main machine set.

BeefPi is just a remote machine and it can do as it pleases for up/down.

So I am sitting here with BeefPi and I wrote the code/flow on this machine (a whole other one).

I saw the messages coming in from BeefPi.
Looks good.
Reboots look ok too. Right messages come through. (Should have screen shot them. Anyway...)

So to test the other one I just unplugged the Cat-5 cable.
I saw one part of the flow see that (time outs).
Then a while later: BedPi off line. WTF!!??

Plug the Cat-5 cable in and I get wrong messages again. BedPi back up.

I looked at both machine's MQTT nodes and pasted the settings with the messages they send on MQTT LWT and Birth Certificates.
They have the correct names in their respective places.

So why am I seeing the wrong name?

I've got MQTT Explorer but it doesn't tell me who sent the message. Or: I haven't found that part.

So yes, I get what you are saying Colin, but this is a real doozie (doozy?) as it is difficult to simulate MQTT comms failure.

I am doing it by pulling out the network cable - of the machine I want to fail. Of course. :wink:
(I'm not that silly - I hope.)

So, do you have any ideas to how I can see what is going on?

I do have debug nodes on MQTT in nodes and see the raw data coming in.
It has the wrong topic (machine sending).
(oh, I'm up to 13 hours today on the machine. I think that's enough for today, so if I don't reply quickly, it is that I am off line.)

Is that an actual LWT message that is wrong? Remember machines don't send LWT messages, the broker does (will the offline one anyway).

[Edit Don't bridge LWT topics to a different broker, if that is what you are doing. For LWT you must have an MQTT node that is connected to the central broker.

There is a lot to be said for this approach. Having a local broker allows flows that use MQTT to communicate within that instance of node red to continue, even if the network is down. By bridging to a remote broker you can get all relevant topics onto a central broker without the necessity for node red on the central machine to know, or care, where all the data is being generated.

Wouldn't it be simpler to just have one broker on a local machine, and setup the Mosquitto configuration to use 2 listeners, one for local comms, and a second for external comms?

That's how mine is setup, and it continues to communicate with my local devices when the internet is down, but when it is up, it communicates with my cloud servers as well.

Bridging is almost trivial to setup.

Not sure what you mean by that.

This is my Mosqitto conf from my local machine, with different listeners for local and remote data.
Port 1883 is not open, so only local devices can use it.

# Set logging level
log_type all
# Forces use of modern version of TLS to avoid security issues
tls_version tlsv1.2

# Local MQTT - Uses local IP address for local sensor data
listener 1883
# End Local MQTT

# Secure MQTT
listener 2087
## Self signed certs
cafile   /etc/mosquitto/ca_certificates/ca.crt
keyfile /etc/mosquitto/certs/server.key
certfile  /etc/mosquitto/certs/server.crt
## Force all clients in this listener to provide a valid certificate
require_certificate true
## Stop all unauthorised connections
allow_anonymous false
## Use password file
password_file /etc/mosquitto/passwordfile
# End Secure MQTT

That lets you access that broker from multiple locations, but it doesn't give you the advantage I described with bridging, that a central machine serving up a system wide dashboard, for example, needs to connect to all the peripheral machines individually and know which one to get each topic from. With bridging it is up to the peripheral machines to push the data they know about to the central broker (and pull back data relevant to them) and the central node-red gets (and puts) everything from the central broker.

I'm not seeing what advantage is gained from using 2 brokers over just one, but it's whatever works best for you.

Actually I am using 8, I think, as I have about 7 distributed systems. Without bridging the central machine would have to configure 7 remote brokers and know which broker provides which topics.
As you say, though, there is not one solution that is perfect, it is a matter of choosing one that seems optimum for each setup.

That is what I don't get :hushed:
Why 7 remote brokers? Why not use 7 clients which subscribe to one central broker... I'm missing the point.

I use MQTT in each pi for passing the data about within node-red and to other processes in the pi. If the network fails, or the central machine is down temporarily, then the remote system keeps going, the only loss is that I can't see or control it from the central machine. If I only had the broker on the central machine then the pi would not be able to control the systems it is in charge of.

This is what happens when I turn on BeefPi (the one causing the trouble) while looking at the MQTT for SOM.

And this is the .config file:

pi@BeefPi:~ $ cat /etc/mosquitto/mosquitto.conf 
# Place your local configuration in /etc/mosquitto/conf.d/
#
# A full description of the configuration file is at
# /usr/share/doc/mosquitto/examples/mosquitto.conf.example

pid_file /var/run/mosquitto.pid

persistence true
persistence_location /var/lib/mosquitto/

log_dest file /var/log/mosquitto/mosquitto.log

include_dir /etc/mosquitto/conf.d

# Bridge mode

#connection local
connection bridge-01
address 192.168.0.99:1883

topic # out 0
topic # in 0
pi@BeefPi:~ $ 

I need this because every now and then it isn't connected to my network and needs to be able to do MQTT stuff.

Ah!

/var/log/mosquitto/mosquitto.log

Looking in that!

Here's an interesting extract:

1612680044: Error creating bridge: Network is unreachable.
1612680044: Warning: Unable to connect to bridge bridge-01.
1612680083: Connecting bridge bridge-01 (192.168.0.99:1883)
1612680095: New connection from 127.0.0.1 on port 1883.
1612680095: New client connected from 127.0.0.1 as mqtt_f2e36d1a.8f334 (c1, k60).
1612681853: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1612683654: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1612685455: Saving in-memory database to /var/lib/mosquitto/mosquitto.db.
1612685974: Client mqtt_f2e36d1a.8f334 disconnected.
1612685974: Error in poll: Interrupted system call.
1612685974: mosquitto version 1.4.10 terminating

If! (And I say IF) BedPi had/s some message cached in that: When this machine connects, it sends its (cached) messages....

Possible?

Though of course it doesn't explain how: when I unplug it I get a BedPi disconnected message.

ARGH!

It shouldn't have been that simple!

I loaded MQTT Explorer and looked in BeefPi's MQTT.
There were the two messages of concern!
BedPI up and BedPi down. (well you get what I mean)

I deleted them and unplugged the cable:

Argh! (number 2)
So close!

I don't want to complain simply for the sake of.....

But that didn't fix the problem. Or: It is still happening.

I got MQTT-explorer and deleted any BedPI messags. I also went to BedPi and reset the messages to not be retained.

Still getting the problem of when I unplug BeefPi I get an MQTT message from (for?) BedPi being disconnected.

Just to help me with the understanding:

Ok, I have some machines with local MQTT brokers which are bridged to the main one.

Where are all these messages stored? Thinking that if I flush the ones I don't want from the cache, it would eliminate the incorrect message being sent.

Just a thought.

Slight update:

Today I am not seeing/getting the BedPi message when I unplug the cable from PiFace.

But in saying that I am also not seeming to get the EOM (death certificate) from PiFace either. That is a few minutes after the cable is unplugged.

Yes, I know I should know, but where do I find the time period before the EOM message is sent?

You say you are running a broker on each device right?
So does the node-red on each device only connect to its local broker? Or do you have each node-red connecting to brokers on other devices?

If node-red is only connecting to its local broker and you are using bridging to get the messages to other pis, then pulling out the cable on any device won't trigger the WILL message - because the connection between node-red and it's local broker has not been broken.

It may help if you draw a diagram that showed what devices you have, where node-red is running, where the brokers are running and what connections you've defined between them all.

I believe I am running brokers on each.

The nodes on that machine now only talk to the broker on their machine.

Thanks!

That will probably be the reason it wasn't working.

And the other machine's WILL was only being broadcast because it was retained and so the broker was just broadcasting its retained messages.

So, to get it working I would need a single node on the remote machine that talks to the main broker.