How to get status on publish to a MQTT Out node?

I'm using the mqtt out node to send messages to an MQTT broker. I would like verify the status of each publish operation. Apologies for the Python code, but this is roughly what I would do outside of Node-RED:

message_info = client.publish(topic, message, qos, retain=retain)
message_info.wait_for_publish(timeout=60)
if message_info.rc != 0:
    return 'error', 500
else:
    return 'success', 200

Is there a way to do the equivalent with the mqtt out node?

Essentially I would like to know if each call to doPublish succeeded or failed. I can use the status node to observe the connection status of the MQTT connection and I can log successful messages with the complete node, but the catch node does not report failures from the mqtt out node. Is there another node I can use? I see in the mqtt out code that the doPublish function is wrapped in a try-catch that will call done(error); on an exception, but I do not know how to track when that event happens.

Maybe it is a dumb idea:
subscribe to the same topic with a MQTT-in node. If you receive the same message you dropped into the MQTT-out node, then the broker has successfully done its job.

The Done node will be triggered for successful transmission.
The Catch node IS fired if an actual error occurs...
chrome_YeEkAj3SMF

Question. What QoS are you using to publish?

@Urs-Eppenberger You can have situations where you have permission to publish to a topic on an MQTT broker, but not have permission to subscribe to the same topic. So that kind of scheme does not work.

@Steve-Mcl What do you define as an actual error? What is your should/fail case?

In my case, I tested two failure scenarios that I've used in the past with other MQTT clients:

  1. Stop a Mosquitto Broker used by mqtt out. In this case I would expect a failure because the client is unable to communicate with the broker.
  2. When using an AWS IoT Core Broker, send a message to a topic where the client does not have permission to publish. In this case AWS will disconnect the client and you might see something like an "Out of Memory" error if you use a Paho SDK.

I've tested with both QoS=0 and QoS=1.

Where an err is included in the callback response from client.publish(xxx, function(err) { ...}) and done(err) is called

That is not how it is done in node-red. If the execution of client.publish is successful (according to the broker), done() is called. This means if the does NOT show "connected", then publish wont be called & the complete node will not be triggered. Additionally, you can collect the status using a status node.

It is not typically considered an error when the publish does not occur (in MQTT) - often the client will store messages then deliver them to the broker upon reconnection. That said, this is not the way node-red does it (at this time)

Lastly, as @Urs-Eppenberger the only way to guarantee delivery is to subscribe to the same topic & if you get a message with the same payload, you can consider it successful. This is an "end to end" check.

Okay, if node-red does not currently store messages when the MQTT broker is disconnected and it does not call publish if the broker is disconnected, then that means the user would have to add something to the flow prior to the mqtt out node to handle those types of errors.

@Steve-Mcl How did you create your example that generated an error.message? My scenario where I send a message to an AWS IoT Core Broker topic without proper permissions should generate an error. The broker is connected in that scenario, but it will disconnect prior to replying with the ACK at QoS=1. What are you doing to get done(err) called? I would like to try it.

Check the Last Will and Testament settings for the broker, by default the topic is set to connections to which you can subscribe (connections/#), this will keep track of clients connecting/disconnecting.

@Steve-Mcl While I appreciate your first example and I understand the behavior you are describing, I still do not know how you created the scenario to generate the example. Could you please describe what you did to generate an error from the MQTT Broker that Node-RED would catch?

I would also caution against any assumptions about subscribing to a topic on the MQTT broker as a way to validate message delivery. Depending on permissions, the client might have the ability to publish to a topic, but not have the ability to subscribe to the same topic.

Finally, I think it would be helpful to provide some clarity as to what the Node-RED MQTT client is trying to do behind the scenes. During my error handling experiments, I've observed the client repeatedly connecting and disconnecting from a Mosquitto broker. Many brokers will not allow a client to make multiple connections if that is what it is trying to do. I've also observed in the Node-RED docker logs that the client was trying to connect to multiple brokers, using saved Server settings that were not actively in use. I tried restarting the flow, but that did not stop the behavior. Nor did disconnecting the node.

I forced the error in the MQTT client library by throwing an error - literally throw new Error("xxx"). I did that to prove the mechanics of node-red & the MQTT node-red node were working - to back up the statement "The Catch node IS fired if an actual error occurs"

That is not normal, not something I see. Is the broker or your network or remote?
Is your connection to the broker flaky / restarting / dropping?
Are you changing connections & deploying often?
Are you programmatically changing/stopping/starting broker connections (via the dynamic actions)?

Can you share actual logs, the full flow and details of versions (e.g. node-red, nodejs) and detals of your architecture (os/hardware of node-red, locality of broker, reliability of connection)


NOTE: there is has recently been some work on the MQTT nodes regarding difficulty disconnecting (both programmatically and by flow modification) depending on the state of the connection at the time of the disconnection action. In short, V3.0.0-beta.1 is far more reliable in disconnecting.

@Steve-Mcl I see, you modified the code. Unfortunately I don't know how to do that yet (new to Node-Red), so I was relying on my usual bag of tricks to mess with MQTT clients. Thank you for clearing that up!

I'm working with two brokers, a local Mosquitto broker and a remote AWS IoT Core broker. I have full control over both of them. To answer your questions:

Is your connection to the broker flaky / restarting / dropping?

The connections to both are solid, but at times I was stopping the Mosquitto broker to cause disconnects in an effort to observe error behavior.

Are you changing connections & deploying often?

Yes!

Are you programmatically changing/stopping/starting broker connections (via the dynamic actions)?

No, I'm doing everything through the flow.

I don't have a lot I think it makes sense to share, but I do see the following in the docker container output even after I deleted the mqtt out node:

5 May 19:09:46 - [info] [mqtt-broker:Mosquitto] Connected to broker: client@mqtt://10.0.1.2:1883
5 May 19:09:46 - [info] [mqtt-broker:Mosquitto] Disconnected from broker: client@mqtt://10.0.1.2:1883
5 May 19:09:46 - [info] [mqtt-broker:Mosquitto] Connected to broker: client@mqtt://10.0.1.2:1883
5 May 19:09:46 - [info] [mqtt-broker:Mosquitto] Disconnected from broker: client@mqtt://10.0.1.2:1883
5 May 19:09:47 - [info] [mqtt-broker:Amazon] Connected to broker: client@mqtts://<REMOVED>.amazonaws.com:8883
5 May 19:09:47 - [info] [mqtt-broker:Amazon] Disconnected from broker: client@mqtts://<REMOVED>.amazonaws.com:8883
5 May 19:09:47 - [info] [mqtt-broker:Amazon] Connected to broker: client@mqtts://<REMOVED>.amazonaws.com:8883
5 May 19:09:47 - [info] [mqtt-broker:Amazon] Disconnected from broker: client@mqtts://<REMOVED>.amazonaws.com:8883

During my attempts at understanding the different error handling, I created several "Servers" from the Properties page of the mqtt out node. One for the Mosquitto broker and one for AWS IoT Core Broker. I was switching back and forth between the two and at times I would change the Connection or Security properties for a given Server. This appears to have had the side effect of creating multiple servers, all of which were attempting to connect with their respective brokers.

I tried restarting the flow, but even then I was not able to stop the behavior. Eventually I restarted the container, recreated the flow with a single Server configuration, and have not observed the previous behavior. I can attempt to recreate the issue and, if successful, provide more information. Especially since this is probably a separate problem from my original question.

That is correct. It is because that kind of caching can easily result in a catastrophic error if an in-memory cache gets too large. Generic monitoring for that in node.js apps isn't easy and there are far too many edge cases.

If you want guaranteed delivery, you may need to look to an enterprise message broker rather than MQTT which is specifically designed to be lightweight. Though MQTT v5 has more options and may be suitable for what you need.

What you almost certainly should do with MQTT is to have secured topic paths that your sending systems cannot access and separate paths used for verification and debugging that they can subscribe to.

After all, even with MQTT v5, the messages output on various different errors are still messages and have to be subscribed to.

That nearly always seems to be caused by reusing a client ID. Make certain they are all unique - you can do that in Node-RED by not specifying one.

That appears to have a fixed client name? Which is not a good idea.

Not sure what is happening there, certainly a lot of us have >1 broker we work with. I was using 3 at one point.

For example, I have 2 connections to the same broker in my dev setup:
image

1 Like

@TotallyInformation Thank you for your reply and please allow me to address some of your questions.

First, to be very clear, I'm not trying to guarantee deliver to the broker. I'm just trying to obtain some kind of pass/fail status on the publish operation so that I can either assume the message will make it to the broker or retry it myself at a later time. I know caching and retrying failed messages can be difficult so I do not expect the mqtt out node to handle that for me.

For example, the AWS IoT Python SDK returns the following on publish:

Tuple containing a Future and the ID of the PUBLISH packet. The QoS determines when the Future completes:

  • For QoS 0, completes as soon as the packet is sent.
  • For QoS 1, completes when PUBACK is received.

This is perfectly fine. The "success" scales appropriately with QoS. Failure can be tracked against reported errors or the future timing out in the case of the connection failing.

My concern with the described Node-RED behavior is that there are cases where doPublish() is not called. What happens if I check the connection and it's good, but then it goes down before doPublish() is called such that the function is not called? Please correct me if I am wrong, but it seems to me that there are windows of time where that scenario might happen. My best bet might be to check the connection, but to also view any message not triggering a complete node as a failure so that I can retry it.

That appears to have a fixed client name?

Yes, it was a fixed client name. Why do you think that is not a good idea? In many cases it's required. For example, AWS IoT Core brokers will often require you to use the "ThingName" as the client ID when connecting as part of their policy.

{
    "Version":"2012-10-17",
    "Statement":[
    {
        "Effect":"Allow",
        "Action":"iot:Connect",
        "Resource":[
            "arn:aws:iot:us-east-1:123456789012:client/${iot:Connection.Thing.ThingName}"
        ]
    }
    ]
}

I've also worked with IT departments that require it for their own monitoring purposes. I would honestly be more surprised by being told to use an automatically generated client ID than a fixed ID at this point.

Yes, I am sure you are right and I guess this hasn't been a priority for anyone before now. I'm not actually sure even whether the underlying node.js library supports puback - I'm pretty sure the nodes don't support it (unless it appeared in the update to support mqtt v5 and I missed it).

I suppose that, because it is generally so easy to check via a subscription, nobody has really bothered with anything else. For the majority of my use, knowing whether a client is online or not (via LWT) and sending without needing an ack (because another message will come along in a minute anyway and I can check if I didn't receive anything for 2 minutes by subscribing) is sufficient. Not like I'm doing mission critical HL7 :slight_smile: (and yes, I have been involved with that in the past for one of the largest single directories - the UK's NHS SPINE).

When I say fixed - having a name of "client" would be very easily copied to another connection with the same name. If two different connections with the same id try to connect, at least with Mosquitto, you will get the constant connect/disconnect that you were seeing before.

Personally, I always make sure that my client ID's are very unlikely to be duplicated. So my MQTT Explorer client might have "dev-pc-explorer" and my dev Node-RED might have "dev-pc-nr", etc. Indeed, for Node-RED where I'm quite likely on a dev machine to have >1 connection, I would be even more careful. Maybe something like "dev-pc-nr-v5test", etc. On my live Node-RED, I typically keep to a single client connection called something like "nrmain" which is the live instance name I typically use.

Yes, that is, of course, fine. Because it will always be unique. Though it does limit you to having a single connection from that "thing". Not usually a problem.

Something that I too would require from a design.

I think that I badly described my intent. It isn't a problem that it is fixed - it is a problem if you end up with something non-unique.

@TotallyInformation Ha ha, I understand now. You were warning me against non-unique client IDs, not against using fixed client IDs. For a second there I thought there might actually be problems with fixed client IDs. I'm not actually using "client". In the logs I just replaced the completely ridiculous client IDs I use in development (along with any PII like my AWS endpoint) with "client" in an attempt to spare everyone the nonsense. It appears I caused further confusion. Sorry about that. :smiley:

For the majority of my use, knowing whether a client is online or not (via LWT) and sending without needing an ack (because another message will come along in a minute anyway and I can check if I didn't receive anything for 2 minutes by subscribing) is sufficient.

That is entirely reasonable behavior and I would do the same thing in most cases. Unfortunately sometimes I receive requirements where I'm asked to retry "failed" messages. That's the reason for my questions. Thank you again for your help!

No worries, I should have been clearer.

Yes, understood. It would certainly be nice to be able to process acks in the mqtt nodes. May need to bend @Steve-Mcl 's ear :wink: Just not sure how complex it gets though nor, as I say, whether the underlying library actually supports them.

Yeah, the underlying library definitely makes a big difference. I've worked with Paho, AWS, Azure, and IBM's MQTT libraries. They are all different. How they handle messages of different data types, re-subscribing to topics after a disconnect, birth messages, etc. It's definitely a challenge.