MQTT Nodes Connection Behaviour

Thanks for that explanation Colin. I didn't realise that there was a keep alive ping from the client node. I'd incorrectly assumed there was a requirement to send messages within this keep alive time to avoid the issue of the LWT message. I will take your advice and look to upgrade Node-RED and Node.js to see if this changes the connection behaviour I am seeing.

My further testing has raised the following question...
Should the Keep Alive time be less than the shortest time between messages?

The results of my further testing...

Upgraded
I upgraded as recommended...
Node-Red 1.3.5
Node.js 10.24.1

Unfortunately this didn't change the MQTT nodes connection behaviour.

Different Broker
I connected using a different broker...
Mosquitto using TLS on port 8883 (Existing Broker)
HiveMQ on port 1883 (Different Broker)

They both had a Keep Alive Time of 30s and didn't use a clean session. The message type and frequency being sent to each of these brokers was different. The internet connection was interrupted for a couple of minutes.
The Mosquitto broker connected MQTT node continued to show connected for approximately 20 minutes without sending data before finally reconnecting and resumed sending data.
The HiveMQ broker connected MQTT node almost immediately reconnected and resumed sending data.

This difference prompted me to look closer at the Mosquitto broker.

PINGREQ & PINGRESP
The Mosquitto broker logs showed "Received PINGREQ" from a different client and the broker "Sending PINGRESP". The broker logs showed no "Received PINGREQ" from my client!

The Keep Alive timer, measured in seconds, defines the maximum time interval between
messages received from a client. It enables the server to detect that the network
connection to a client has dropped, without having to wait for the long TCP/IP timeout.
The client has a responsibility to send a message within each Keep Alive time period. In
the absence of a data-related message during the time period, the client sends a
PINGREQ message, which the server acknowledges with a PINGRESP message.

My client is sending a message (watchdog pulse, QoS 2) every 10 seconds and I am also setup to receive the message being sent so I can display it in a debug node. As a result I do not send a PINGREQ message. So I was curious if the behaviour would change if I selected a Keep Alive time that would force my client to send a PINGREQ. Selecting a Keep Alive time of 9 seconds and monitoring the broker logs showed "Received PINGREQ" from my client. Interrupting the internet connection for a couple of minutes now has delivered me mixed results. A number of times it has disconnected quickly and then reconnected once the connection has been re-established. However, I've also experienced the ~20 minute disconnection delay again, but I'm wondering if this now the exception and not the rule.

Disconnection Delay
I suspect now that the ~20 minute disconnection delay is due to the TCP/IP timeout, which is dictated by the following file on Linux.

/proc/sys/net/ipv4/tcp_retries2

The default value is 15, which corresponds to a duration of approximately between 13 to 30 minutes.
A hypothetical timeout of 924.6 seconds is a lower bound for the effective timeout.

As I understand it - I thought that any received message (at the broker) would act the same as a ping message and indicate that the client was still alive - and likewise at the client end there was no need to send a keepalive message until the set time after the last transmitted real message. I.E. you only needed to send keepalives if you weren't sending data for a period of time greater than the keepalive timeout.

At least that is how I thought it was supposed to work...

Based on my testing I think this is how it does work. The problem is when the internet connection is interrupted AND the MQTT node continues receiving messages more frequently than the Keep Alive time to transmit. It therefore has no need to send the PINGREQ. If it did send the PINGREQ it would time out waiting for the PINGRESP and disconnect.

What I can't reconcile now though is that I get this behaviour from the Mosquitto broker connection, but not from the HiveMQ broker connection.

How can it continue receiving messages if the connection is interrupted?

You only quoted half of my sentence! :slight_smile:
The problem is when the internet connection is interrupted AND the MQTT node continues receiving messages more frequently than the Keep Alive time to transmit.

I may not be using the best terminology here and I'm happy to be corrected, but worded differently...
The MQTT out node is not sending messages, but it is receiving messages from the flow to transmit to the broker.
The MQTT in node is not receiving messages.

@colin here is a case: you have a broker on the same device as NR. NR sends/receives msgs to other devices on the network AND sends/receives msgs to other tabs/locations within the flow.

So you could loose network connection but the mqtt msgs between the tabs/locations in the flow would(should) still work (I think :thinking:)

I don't think that is the situation here. @Siothrun can you confirm that node red is running on one device and the broker is across the internet?

Messages passed to the mqtt node from the flow are of no consequence in whether ping requests will be sent. It is purely whether data is received from the broker.

What QoS have you specified in the MQTT nodes? If you are using zero does it make a difference if you change it to 1?

I can confirm that Node-RED and the MQTT broker are on different devices.

MTX-GTW II AUS is running Node-RED.
Windows VM is running the Mosquitto MQTT broker.

The MQTT in node is also receiving messages from the broker every 10 seconds (when the internet connection is active). So if this node stops receiving messages for longer than the Keep Alive time when the internet connection is interrupted it should send a PINGREQ, but then fail to receive a PINGRESP. When it fails to receive the PINGRESP within 1.5 x Keep Alive time it should disconnect? I’m certainly not seeing this behaviour.

MQTT in nodes have QoS 0.
MQTT out nodes have QoS 2.
I can experiment with using a different QoS, but ultimately my application requires these QoS settings.

Why does it require QoS 0?

Have you upgraded node red yet?

Node-RED in my application is being used in a control application to interface a cloud HMI to a PLC.

When reading values from the PLC to display on the HMI QoS 2 is desired. The values read are time stamped and the history of what is happening is important. This is why the MQTT backfill is relevant to me.

When writing values from the HMI to the PLC QoS 0 is required. If the internet connection is interrupted I don’t want a value/command being buffered and written to the PLC at a later date.

I suspect that using QoS 0 does not absolutely guarantee that the value will not be buffered in the mqtt driver and sent later. If it is essential that you do not use old data then I suggest adding a timestamp and check at the receiving end that it is not stale.

I suggest writing the receiving end so that it ignores repeated messages. That will make your life much simpler. Then, if what you are after is guaranteeing that you do not lose any messages you can use something like this flow to send the data. The flow can even buffer the data over a node-red restart if that is required.

Off-topic but

There's no absolutes in life :slight_smile: but I'd wager a whole £5 that any proper mqtt client/broker would only push a non-retained QOS0 message out once.

That isn't the issue here, as I understand it. The issue is that if the connection is broken for a short time (but not long enough for the mqtt to disconnect) then the mqtt node buffers messages sent to it (or at least may do, I haven't tested it with QoS 0) and sends the data when the network reconnects. So that the message arrives late.

...........

@Siothrun I have just re-read the thread, I had missed the bit about hivemq timing out immediately, but mosquitto not timing out for 20 minutes. That is very strange. Can you explain a bit more about how the device is connected to the internet, and how it is connected to the machine running mosquitto?

You may well be onto something there. What happens if (on the device) you run a ping to the mosquitto server and leave it running while you take the sim card out. Also does the device have a network connected display of some sort? If so then how does it behave?

The Node-RED device is the MTX-GTW II-S AUS, which has a Debian GNU/Linux OS. It is connected to the internet via a SIM card.
The Mosquitto broker is running on a VM in Azure.

Are you particular about where the ping is originating from? Node-RED? OS?
I have had a ping running to Google's DNS server (8.8.8.8) from both Node-RED and the OS while removing the SIM card whilst conducting these tests.
I haven't tested this while pinging the Mosquitto broker, but given I can't reach Google's DNS server I wouldn't expect different results. Unless you suspect a mechanism that escapes me?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.