Redeploy an existing flow from another flow using the flows API

Last summer, pretty much the same topic came up and the response was that even though the payload contains info for a specific flow (tab), all flows will reload as long as the reload parameter is set in the header. That answer explained the problem better, but not a solution (except to suggest modifying nodes first as a workaround).

I have a flow that monitors system performance and would benefit from being able to start selected flows. I think/hope this is mostly a documentation issue and will be solved by choosing one of the other options:

I don't want to modify any nodes in any substantial way, just restart the flow(s) passed in the message body to the POST.

This scenario is a v2 POST, I am using inline node credentials credentials , and have not included the rev property. According to the docs, that should force a reload without comparison.

My observation is that the HTTP POST command executes successfully, reloading ALL flows. Because that includes my monitoring flow, the system can end up in an infinite loop/race condition.

Is there a header option other than reload for the Node-RED-Deployment-Type that will honor a request to only load the flows (list?) in the body (regardless of node change)? If not, can you explain the logic for passing these details? Or suggest an API call to trivially change a node on the flow I want to reload?

Thanks for any help you can provide!

What exactly do you mean by 'start selected flows'. Normally the action in a flow is started by sending it a message.

I'll give more details here, Perhaps more than you want...

Background: For now, I'm using NodeRed to move data from the public HIveMQ (broker.hivemq.com) to a local instance of InfluxDb. NodeRed is a Docker container on the same local server as InfluxDB. The flow to import the data from HiveMQ is all in one flow (tab). Occasionally that flow hangs. Usually, the MQTT In nodes are reporting "connected", but are clearly not responding to new or retained messages. I have built another flow designed to monitor the progress. Data demands are not too high, but I do expect a block of data every hour. The monitoring flow checks for recent influxdb data and sends an email if no recent data is found. I reliably get emails.

Analysis
When looking at the NodeRed logs for the time when this happens, I typically see one or more messages about disconnect/reconnect from HIveMQ. Another flow that gets data from HIveMQ also becomes unresponsive. There could be more details on the HIveMQ side that I don't have visibility of. The monitor flow is healthy- as it can query InfluxDB and send emails. I think one cause could be maintenance windows on the HiveMQ server. I cannot control that, but I should be able to recover from it automatically.

Remediation
This happens often enough that I would like the same flow that detects the problem to be able to call a REST API to restart whatever is stalled. Empirically, I can tell that a trivial change on any node on the HiveMQ flow, with a subsequent deploy of modified flows will bring the flows related to HIveMQ back to life. I have tried adding an HTTP POST Inline with the email process to reload the hung flow, passing the id, name, type, and credentials in the body of the API call.

{
	"flows": [
		{
			"id": "4fd7ef6b3f578893",
			"type": "tab",
			"label": "Flow 2",
			"credentials": {
				"user": "yy",
				"pass": "xxxxxxxxx"
			}
		}
	]
}

In the properties for the HTTP POST node, I have added:

I don't want to reload all flows if only the HiveMQ MQTT process is hung. In the future, that might be disruptive to other processes managed in NodeRed

I've invested a bunch of time to get this to work and need reliability. if the API will only load all flows or those that have actually been modified, how would you recommend I programmatically making a trivial change (in a flow) so that I can use "flows" instead of "reload".

I still need to figure out how MQTT qos settings can be used to process messages only once. I will need to look at other options if I can't achieve resiliency with NodeRed and HIveMQ, so qos really doesn't matter yet.

Have you tried using dynamic subscriptions with the MQTT nodes?
You can use control messages to disconnect & connect, subscribe and unsubscribe.

Are you getting duplicated messages or messages lost when the hanging/disconnecting/whatever problem occurs?

For now, qos is not really on my radar, but since you ask...
I am not familiar with dynamic subscriptions, but have looked at them a bit just now. I'm not sure that being able to dynamically subscribe/unsubscribe and connect to various brokers would bring much to this party. Staying connected with automatically reconnection attaching to a handful of topics is the desired behavior. Maybe you are suggesting that I could have better control on cycling the connection using dynamic subscriptions? Having that flexibility might be better than redeploying an entire flow or environment.

The messages are published from a microcontroller over cellular with a qos of 1 and a retained=true. Ideally, I would like NodeRed to adhere to a "process only once" qos. When I redeploy flows, it appears that all retained messages are reprocessed even if they have been processed before. That leads me to think I don't have the secret sauce for qos figured out. Based on the HIveMQ blog, If the same client ID connects and clean session is false, then only never before seen (retained) messages will be delivered. The qos setting determines how rigorous that delivery is. In the NodeRed MQTT IN nodes, I think I currently have that set to 2 which is probably too high. If all else fails, I can avoid duplicate processing by querying my influxdb data (by logical key) and only insert if it doesn't exist. That seems like a hack.

If I can't get better stability with the connection, I'll probably need to change the architecture- moving to different technologies/platforms. Hope not.

If it turns out that you can't prevent dropped connections but you can detect them, maybe an explicit disconnect then connect message will reestablish contact. Less disruptive than redeploying.
Certainly it's much better to find out the root problem.

I think that is what I'd expect, but like you I'm not yet sure of the right combination of settings to ensure every message gets delivered just once.
I don't think that retained messages have a role to play in this because only a single message is retained per topic.

@Colin has demonstrated a flow to guarantee message delivery which was recently discussed in the forum.

MQTT reconnects automatically. You potentially have some other issue.

Describe what you mean by "hung"

It would permit you to request the MQTT config to disconnect then reconnect. This essentially closes the client and reinitialises it (just as would happen with a reload or restart)

Your current approach of reloading flows is not necessary and is equivalent to cracking a nut with a sledgehammer.

Here is a demo you can try out: Dynamic MQTT LWT birth, Close & Will Property - #2 by Steve-Mcl

You only really need the connect and disconnect actions

I absolutely agree that fixing the root cause is better than building a bigger sledgehammer.

I use the word "hung" because I can observe new messages arriving in MQTT (e.g through the HIveMQ websocket client) that should be processed by NodeRed, but aren't - until I make a modification and redeploy the flow. Debug nodes don't show any activity from MQTT IN. I may or may not see anything in the NodeRed logs,- maybe a number of disconnects connect pairs with a minute or less elapsed time. I have no insight as to what might be the reason for the disconnect from the public HiveMQ broker. Are there other places to look to understand why the connection says "connected" but isn't responding to new messages? I suppose this could be related to my qos settings (e.g. NodeRed is waiting for a confirmation that HiveMQ sent),

Since this happens maybe once or twice a day (and typically overnight) I don't have many opportunities to test solutions, and lack information to be more tactical. Is there a setting to make the logs more verbose?

I will look at the example. I understand that dynamic config would allow me to disconnect/reconnect without a redeploy. At the same time, it appears to add more complexity than shouldn't be necessary if the simpler MQTT IN just worked (i.e. do the same disconnect/reconnect reliably). Under the covers, I would expect the reconnect js to basically do the same two steps. The issue is that the MQTT IN node appears to not detect the issue I'm facing.

Do you have multiple config nodes when you select a config?

Do you specify a client ID or leave it blank?

This discussion might be relevant to nodes disconnecting but still saying connected.

I confess I could not really grasp the implications of the replies.

Reading up on dynamic subscriptions, I see a reference to dynamic connections. Since all MQTT IN nodes to HiveMQ fail at the same time, and all go through a common broker connection, is that worthwhile making "dynamic"?

Let's try this another way.

All of my MQTT connections in node-red are rock solid.

Once, I had 2 same client IDs connecting and this was causing a fight.

To rule out your side, you could try connecting to a local MQTT or a different online broker (there are a lot).

It may be hive is the issue. It may be you have unreliable internet. It may be your router is faulty.

The dynamic part of MQTT nodes can be used to disconnect and reconnect the single config (in other words, you might have 66 separate in/out nodes BUT my sending msg.action disconnect to any one of those nodes WILL kill the config connection for all that use it. sending msg.action connect will reconnect the config and all 66 will reconnect. (Actually, it's only 1 connection shared for all instances using that config)

Hope that helps.

1 Like

There is a single broker config node that is referenced by five mqqt in nodes distributed across two flows.

Here are the properties for the broker connection:

I specify a static client id (not the one shown) and not a clean connection because those are requirements if QOS has a chance to survive disconnects as designed. And... yes, I realize that I should be using secure connections. That won't happen until everything else works.

Here is the reference I'm looking at that describes dynamic connections: Of specific note, unchecking "connect automatically" enables/requires specific actions to connect and disconnect. I have download the example flow, and note that it is using MQTT v5, which I might try. I believe that the pubsubclient I'm using with the esp32 doesn't support that, but that shouldn't be a constraint on NodeRed subscriptions to HiveMQ.

Good to hear... I hope I can get there, too.

As mentioned above, there is only one broker connection defined and it uses a static client ID. There are only five subscriptions. I am considering letting the client ID go random and/or clean session to see if that impacts anything. I suspect that most users use those defaults, which takes all the "confirmation" network traffic out of scope. I can roll my own "QOS like" support using random client ID's if I need to abandon QOS as defined in/promised by MQTT. I would try those changes before switching brokers, only because my esp32 publisher is 150 miles away and not set up for OTA updates.

The public MQTT broker is subject to going down/running test code. I typically leave their websocket client running, and usually it disconnects at the same time as nodeRed logs similar messages on the server (on my home network. So, I think the disconnects happen at the broker, but won't discount the possibility that something from my router to the datacenter is to blame. Regardless, the entire system needs to become robust enough to recover.

Like any sporadic issue, testing is the real problem.

This seems to be similar but different to your issue. I have the broker set to use MQTT v5:

  • Connection to Hivemq goes down (by rebooting the router), mqtt node statuses stay at "connected".
  • Connection back up - (retained) lwt and birth messages are not shown. This demonstrates that the node does not know about the disconnect.
  • Messages published now do not show up, neither in the mqtt-in node nor the hivemq web client.
  • It stays in this state indefinitely, Node-red is publishing messages every minute, they disappear into the ether.
  • Then go back to the Node-red browser tab and suddenly it shows "connecting", "connected" and published messages start showing up in the web client.

Perhaps messages disappearing into the ether is related to my mobile rather than landline internet?

The sudden realisation and reconnection when I open that browser tab just seems weird.

I think the mqtt broker node misses the disconnection because the router reboot takes slightly less than the session expiry and/or keep-alive time.
If I reduce these settings to 10 it catches the disconnection.
That's not a fix though, what if the internet drops out for 5 seconds?

1 Like

Trying to change only one thing at a time here... I decided to try switching to v5 from 3.1.1. I increased the session expiry time from the default 60 seconds to 6000 seconds. Overnight the HiveMQ web client did disconnect. ON the other end, my esp32 publisher decided to reboot which is consistent with HiveMQ being offline for a bit. The NodeRed logs show connection issues at about the same time.

Without any intervention on my part, nodeRed began processing incoming messages again. :grinning: This has not happened with MQTT 3.1.1. Looking at the cellular activity from the esp32, I believe that HiveMQ was accepting new publishes by 4 minutes after midnight (after an 8 minute outage?). The published data was processed by NodeRed and inserted into influxDb. What seems odd is that influxDB put a timestamp on the related records of 2:06 AM yet the debug node didn't report it until 3:45 AM. This may be similar to what @jbudd was seeing?

Another flow uses the same HIveMQ connection. It routinely gets new publishes every ten minutes. There is no gap in the influxDb data. This suggests that the issue is associated with a flow and not the connection? I don't know.

I added a status node/debug node associated with the mqtt in nodes last night. That combination produced a bunch of "undefined" payloads, so I have updated with a function node to prepare a message to load into Influx to record connects and disconnects. That might prove valuable.

For those who have contributed and/or those who might stumble across this thread with a similar issue...

The preferred option is to resolve the root cause. Make the MQTT connection solid. Toward that end, I have seen NO disconnect/reconnect cycles after changing my configuration for the MQTT broker connection to version 5 (from 3.1.1) and also changing the keepalive and expiry values from 6000 to the default of 60.
image
As a result, the status monitoring I put in place to save disconnect/reconnect timestamps to InfluxDb has not resulted in any saved data points.

The NodeRed logs showed 25 disconnects from the broker in the five days prior to these changes and none since. That's encouraging.

This doesn't really answer the original question but does resolve my issue (for now). For those who really do want an answer to the original post, I think the only way to programmatically redeploy a single flow is to first make a modification to something in that flow, then trigger a redeploy of modified flows. Perhaps something not too invasive would be to programmatically add a comment node to the flow (via the API), delete it, and then redeploy modified flows by specifying the http header Node-RED-Deployment-Type as "flows" (not "reload").

Hope this helps!