My use case is for a mobile device that can go in and out of network coverage. I have a constant data feed from sensors and need the data to backfill/send when network is reconnected.
So my MQTT node goes from connected to disconnected to connecting, etc. and eventually back to connected. My QoS is 1.
My only solution to this is to change the MQTT core node to always try to publish when using a QoS greater than 0.
Change: if (node.connected) {
To: if (node.connected || msg.qos > 0) {
In the client code, there is a test for connected and the msg will be stored in a map (max size ~65000) if not connected. The map will be flushed and sent once connected again. I've tested going over 65000 in the map and the node does not crash.
I've tried this and it works as a solution. I'm putting it out there to ask if there is an alternative solution though rather than creating my own MQTT node with this small code change.
Whilst the code is small, the logic change is not small.
Personally, I would think that this would have to be an option that could be turned on - defaulting to off since this is a significant change. I think that you would also need options for sizing since it would be impossible to second-guess whether a 65k element array of objects would break any individual deployment of Node-RED.
I also wonder about unintended consequences but I don't know the internals of the Node-RED codebase to comment sensibly.
A point. Though I would counter that by saying that I think trying to handle such a caching process in Node-RED is very fragile. There are too many variables involved that could result in consequences such as running out of RAM.
While the code change works in your environment and with your data size/throughput, you couldn't say that of other people's configurations.
So the Node-RED mqtt node has made qos 1 and 2 redundant by having a connection check around publish. The underlying mqtt client lib already manages messages correctly based on qos. In fact the mqtt client can be configured with a storage policy. The default policy is es6-map.
If Node-RED is running in a constrained environment where qos>0 can not be supported. Then I would expect qos to be set to 0 in that environment.
This discussion comes up on occasion and is something we need to improve with the MQTT nodes. The connected/disconnect events you can get from the status node do help to create a flow that can redirect messages when the connection is down. But it is very far from perfect - there is a window when the client doesn't yet know it is disconnected when it will buffer the messages internally, but they aren't persisted so will be lost if NR restarts. It is also actually quite hard to build a flow that does the right thing in all cases.
When the nodes were first written, the mqtt client library we used didn't provide much in this area. But it has moved on a long way and we haven't taken advantage of what it can provide.
I definitely think there is a place to have some options exposed on the broker node as to how it should handle offline messages. We need to distill it down to the core set of configuration properties so we can provide enough flexibility without overwhelming the user with choice. We need to be conscious of the different environments we run in - for example, running in IBM Cloud you don't have a persisted file system so can rely on that for storage. Quite how we handle those sorts of cases I don't know - but maybe we don't try to solve it for every case.
If someone wants to take a stab and listing what options it needs to expose - and doing so with awareness of what the underlying client is capable of - that would help to move this forward.
That makes sense as to why the connection is check before the underlying client publish function is called.
The underlying client has moved on and allows for a storage mechanism to set and manages messages based on the qos. For qos 0 messages are never stored. For qos 1 and 2 the message are stored.
If the mqtt node removed the connection check on publish and allowed for the configuration of the storage mechanism, then most uses cases could be meet.
Thanks Nick for that clear explanation. Makes perfect sense.
I'm going to go down the road of creating my own node for our use-case/hardware in the meantime.