MQTT reconnection failure to Flespi.io

Hi
I've been using node-red with mosquitto for more than a year without any problems.
Recently, for a site where I can't have a local server, I started using Flespi.io cloud-based MQTT broker.

Everything works fine except that from time to time the Internet connection suffer for some breaks (less than 1 minute).
After the Internet is recovered, Node-red (initially 1.3.7-10 but same after upgrade to 2.1.4-14) fails to properly reconnect.
More exactly, it loops with:

  • connection
  • disconnection
    And again all over again until I restart the MQTT global node (by changing a part of its configuration such as credentials to a wrong value, deploy, change back to correct value, re-deploy). This insure that the connection is restored.

The Flespi logs refers to an "unexpected disconnection". But Node-red doesn't report any error (see below, filtered)

Anyone can suggest a way to investigate this further ?
I will ask also on Flespi.io's forum

Thanks

Exemple Flespi logs

26/12/2021 19:32:14 | mqtt session connection was accepted | node-red_sonnaz | clean: true; expiry_interval: 0; origin_id: 0; origin_type: 14; peer: "#.##.##.##"; rejected: 0; session_present: false; token_id: 32288721; version: 4;
26/12/2021 19:32:14 | mqtt session connection was closed (unexpected disconnect) | node-red_sonnaz | clean: true; expiry_interval: 0; origin_id: 0; origin_type: 14; rejected: 0; session_present: false; token_id: 32288721; version: 4;

Node-red logs:

>> Restart of container
26 Dec 11:47:30 - [info]

Welcome to Node-RED
===================

26 Dec 11:47:30 - [info] Node-RED version: v2.1.4
26 Dec 11:47:30 - [info] Node.js  version: v14.18.2
26 Dec 11:47:30 - [info] Linux 5.4.0-91-generic ia32 LE
26 Dec 11:47:32 - [info] Loading palette nodes
26 Dec 11:47:37 - [info] Dashboard version 3.1.3 started at /ui
26 Dec 11:47:38 - [info] Settings file  : /data/settings.js
26 Dec 11:47:38 - [info] Context store  : 'default' [module=memory]
26 Dec 11:47:38 - [info] User directory : /data
26 Dec 11:47:38 - [warn] Projects disabled : editorTheme.projects.enabled=false
26 Dec 11:47:38 - [info] Flows file     : /data/flows.json
26 Dec 11:47:38 - [info] Server now running at http://127.0.0.1:1880/
...
26 Dec 11:47:40 - [info] [mqtt-broker:Flespi] Connected to broker: node-red_sonnaz@mqtts://mqtt.flespi.io:8883
....
>> No problems
....
26 Dec 19:24:09 - [info] [mqtt-broker:Flespi] Disconnected from broker: node-red_sonnaz@mqtts://mqtt.flespi.io:8883
26 Dec 19:25:05 - [info] [mqtt-broker:Flespi] Connected to broker: node-red_sonnaz@mqtts://mqtt.flespi.io:8883
26 Dec 19:25:05 - [info] [mqtt-broker:Flespi] Disconnected from broker: node-red_sonnaz@mqtts://mqtt.flespi.io:8883
26 Dec 19:25:21 - [info] [mqtt-broker:Flespi] Connected to broker: node-red_sonnaz@mqtts://mqtt.flespi.io:8883
26 Dec 19:25:21 - [info] [mqtt-broker:Flespi] Disconnected from broker: node-red_sonnaz@mqtts://mqtt.flespi.io:8883
....
>> continuous disconnection / reconnection 
....
26 Dec 20:13:31 - [info] [mqtt-broker:Flespi] Connected to broker: node-red_sonnaz@mqtts://mqtt.flespi.io:8883
26 Dec 20:13:31 - [info] [mqtt-broker:Flespi] Disconnected from broker: node-red_sonnaz@mqtts://mqtt.flespi.io:8883
26 Dec 20:13:46 - [info] [mqtt-broker:Flespi] Connected to broker: node-red_sonnaz@mqtts://mqtt.flespi.io:8883
26 Dec 20:13:46 - [info] [mqtt-broker:Flespi] Disconnected from broker: node-red_sonnaz@mqtts://mqtt.flespi.io:8883
>> changing one of the settings in the MQTT-broker's Messages tab
26 Dec 20:13:59 - [info] Stopping modified nodes
26 Dec 20:13:59 - [info] Stopped modified nodes
26 Dec 20:13:59 - [info] Starting modified nodes
26 Dec 20:13:59 - [info] Started modified nodes
26 Dec 20:13:59 - [info] [mqtt-broker:Flespi] Connected to broker: node-red_sonnaz@mqtts://mqtt.flespi.io:8883
>> changing back the settings in the MQTT-broker's Messages tab
26 Dec 20:14:13 - [info] Stopping modified nodes
26 Dec 20:14:13 - [info] Stopped modified nodes
26 Dec 20:14:13 - [info] Starting modified nodes
26 Dec 20:14:14 - [info] Started modified nodes
26 Dec 20:14:14 - [info] [mqtt-broker:Flespi] Connected to broker: node-red_sonnaz@mqtts://mqtt.flespi.io:8883
>> Now it's stable again

Welcome to the forum @Barbudor

If you connect using an alternative client, such as MQTT Explorer (which is an excellent tool) does that recover after a network failure?

Hi @Colin

Other devices on the same site such as Tasmotas (ESP32/ESP8266 using PubSubClient MQTT library) are reconnecting seamlessly after the Internet break.
Only Node-red seems to have issues.
This site is connected by DSL and located in country side.

I have another site with Nodered 1.3.7 and 2.1.4 which doesn't show the problem with Flepsi but this other site is connected through Fiber and in an urban area. Less prone to Internet breaks.

Does it reconnect correctly if you use Restart Flows from the Deploy button dropdown?

Can you show a screenshot of the Server node connection tab please. I don't know what might be causing the problem, but I might see something there.

Hi @Colin

For now, what I'm doing to force reconnect, is editing the global mqtt broker node, changing a setting such as the LWT message and then "Deploy modified nodes".
I haven't tried specifically "Restart flows". Do you want me to try it next time ?
As a limited change is enough to force proper reconnection, I would assume that restarting every flows would also works.

My settings are normally those:



(next screenshot in next post as I am limited to 2 per post)

I've just made a small change now following advice from Flepsi support to go through a TCP proxy that they provide to record all communications (without TLS). Hopefully Schrodinger was wrong and I won't kill the cat by looking into the box.
I'm also running now a TCP dump circular recording on the link.

Is there any advanced debug logging on MQTT node that could be enabled ?

Thanks a lot

@Colin
It just occured again. 2nd time today.
I can confirm that "Restart Flows" worked also
Regards

I don't think that Use TLS should be selected.

It doesn't connect if TLS is not checked
Did you noticed that I'm using port 8883 for MQTTS ?

I thought that if you select TLS then you had to configure a TLS Config, which you don't appear to have done. Perhaps I am wrong.

I have to select TLS for my Hive MQTT broker but I don't have to configure anything
image

OK, it must be optional I suppose.

I am out of ideas on this problem then. Perhaps we need an MQTT expert to chip in on why it does not reconnect after a network failure. @dceejay?

@Barbudor, as an experiment, try emptying the client id in the mqtt config. Unless that is needed for authorisation by flespi.
Also, separately, try deselecting Use Clean Session.

Thanks for the suggestion @Colin
I just faced the problem again (2nd time today) so I took the opportunity for removing Use Clean Session.
Crossing fingers

I don't have any great hopes really. Just clutching at straws.

I finally had some feedback from Flespi who pointed that their broker has a limit of 16 topic per subscription request : flespi MQTT broker - MQTT 5.0 compliant, secure, fast, and free
Indeed I have 18 topics in my subscription request.
It doesn't explain why the request works when I restart Node-red but not on subsequent reconnections.

Unfortunately I don't see, as of today, any ways in the MQTT broker node to limit the number of topics per subscription request.

I will let the test with no Clean Session run for a couple more days and then split my access to Flespi across 2 different MQTT Broker nodes as a work around.
At the same I will post a feature suggestion to add a limit to the number of topics per subscriptions requests (done at Add option to limit number of topics per subscription requests)

Analysis update:

  • The problem has been narrowed down as a limitation on Flespi to have no more than 16 topic patterns specified into a single subscribe request
  • When Global MQTT node restart, it does create 1 request per existing MQTT Input node which leads to have as many individual request with a single topic pattern as there are MQTT Input nodes. This explain why the connection is accepted at the start of the flow/node.
  • In case of connection outage, it is the underlaying MQTT.js library which sends an aggregated subscription request with all topic patterns in the same request instead of individual requests. This not in the control of Nodered as it is a decision of MQTT.js (which can be seen as compliant with the standard as the standard do not enforce there could be a limitation in the number of topic patterns provided in the subscription request)
  • This happens only when "Clean Session" is checked because in that case, a new session is recreated at the reconnection and it is expected that the broker has no memory of the past, so all subscriptions need to be renewed. If "Clean Session" is not checked, the MQTT.js library assume correctly that the broker as kept all the subscriptions in the session context and do not send this aggregated subscription request.

Solution:
There are 2 solutions to overcome this Flespi limitation:

  1. Do not check "Clean Session". This has been tested with a connection outage of 1 minute. However there may be a timeout after which Flespi broker would erase an outstanding session and a clean session would be required. This hasn't been tested as I have no idea yet of this timeout duration and I can't disturb more the existing system.
    Side note : Not checking "Clean Session" has the advantage that all messages to Nodered are queued and would be delivered back upon reconnection.
    Alternative workaround:
  2. Create multiple global MQTT Broker configurations nodes all pointing to Flepsi and load-balance your MQTT Input node so you do not go over 16 subscriptions per Broker nodes. Each Broker node could use the same token but should use different Client_ID (that should be true if you leave the Client_ID field empty, or you can manually enter distinct Client_ID).

As today Nodered can't provide a way to limit the number of aggregated subscription into a single request as this is depending on MQTT.js. I believe the options above are enough for a small system.

I hope this will help others that may face the same issue.

1 Like

Special thanks to @Colin
Even if your suggestion was out of the blue, it was indeed a good one :+1: and we now know why.

@Barbudor if you get this issue again, could you try sending a disconnect/connect action to any of the mqtt in or out nodes to force a reconnect (at runtime, without restarting flows)

See info in the MQTT built in help.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.