Trouble with unreliable modbus-flex-getter

Roll back more than 2 years worth of updates? Sounds risky! Would be a long day to see through all updates and what I trade that for.

I got loads of checks on msg content to determine if error. For example the msg.error as you point out, but I prefer:

const errors = [];

if(msg?.payload == null || msg?.payload?.length === 0 || msg?.payload === ""){
  // payload should be array or string
  errors.push("Empty payload.");
}

if (msg?.error != null) {
  if (msg?.error?.message) {
  // error exists in output
    errors.push(msg.error.message);
  } else {
    errors.push(JSON.stringify(msg.error));
  }
}

if(errors.length > 0){
  throw new Error(errors.join(" "));
}

const output = msg.topic;

output.payload = msg.payload;
output.topic = msg.retryConfig.topic;

// clean up
delete output.retryConfig;
delete output.logMessage;
return output;

A bit overkill here, but it's the same way I do it for other I/O nodes which sometimes doesn't produce any error at all, but instead just spits out empty payload.

How do I retry without complex logic and throwing the exception to a catch node? I did find out about the retry node from the library, so that could simplify things. But since I effectively made the same myself, I get more fine-grained control.

Before developing the guaranteed-delivery node I attempted to do the same thing with a subflow, and found it impossible to avoid race conditions that could mess things up. The sort of situation that is difficult to handle with a number of separate nodes is when messages arrive at the input at unexpected times. I haven't looked in detail at your flow, but imagine a message is passed in and some sort of error occurs, this is passed to a node that decides that the message must be retried but just before it is sent back to the beginning to be retried, another message arrives at the input. Confusion can reign. Perhaps you have handled that condition, but there are numerous other timing issues and I was unable to find a flow that coped with them all, other than using semaphore nodes. It just got more and more complex to handle. Putting all the retry handling in one node (either a function node or a contrib node) is, I believe, the only way to guarantee reliability. Hence I developed the guaranteed delivery node.

1 Like

I hope not to encounter any async or race condition problems. The mechanics are really separate. It start with guaranteed output.

This is what I do before sending it to modbus (unsafe I/O node):

  1. get an object from context (or create it first time if it doesn't exist)
  2. generate an unique id, store it in the msg and in the context obj as a variable name (key). Value is current timestamp.
  3. Send 2 copies of the msg, one to modbus, one to a wait node

At this point, modbus can handle the request with its own timeout. At the same time, the shadow copy waits in delay and eventually goes into a function node to add error message "timeout - msg lost".

Finally, both paths meet:

  1. get the object from context (it must already be created at this time)
  2. look up the unique id from the msg in the context obj
  3. if it exists, this message is the first to arrive, so delete it from the context obj, then pass along the msg
  4. if it doesn't exist, that means it's already arrived, so return null.

You think this use of context objects on keys could encounter race conditions?

The retry mechanic is much simples and doesn't use context at all. The incoming message clones itself, stored in a backup variable in the msg. And initializes the attempt/retry counter. After the request, if it has some indication it failed, it throws an exception. The catch node then sends it to function node, test if attempt/retry counter reached max. If it did, route it to error. If not, reapply the backup of the original msg, increment attempt/retry counter and send it to retry route with a delay for good measure.

  1. Register incoming messages:
function getTrackedMessages() {
  let trackedMessages = flow.get("trackedMessages", "memory");
  if (!trackedMessages) {
    trackedMessages = {}; // initialize (if doesn't exist)
    flow.set("trackedMessages", trackedMessages, "memory");
  }
  return trackedMessages;
}

msg.topic.retryConfig.trackingId = RED.util.generateId();
const trackedMessages = getTrackedMessages();

trackedMessages[msg.topic.retryConfig.trackingId] = new Date().getTime();
return msg;
  1. set error (on shadow copy, after delay):
// prepare failed msg
delete msg.payload;
msg.error = {
  message: "ERROR: Request timeout - msg lost in modbus node."
};
return msg;
  1. register outgoing messeages
const trackingId = msg.topic.retryConfig.trackingId;
if (!trackingId) {
  node.error("ERROR: Message without tracking id!", msg);
  return;
}

const trackedMessages = flow.get("trackedMessages", "memory");
if (trackedMessages[trackingId]) {
  // first message arrives
  delete trackedMessages[trackingId];
  delete msg.topic.retryConfig.trackingId; // clean up
  return msg;
}


// msg already passed
return;

What happens if another modbus request arrives at the start before the first one has been completed?

If 2 msg are handled at the same time by modbus flex getter? They have unique IDs. When one ins completed, it goes to register outgoing messages. Here we check if it exists in trackedMessages object:

if (trackedMessages[trackingId]) {
  // first message arrives
  delete trackedMessages[trackingId];
  delete msg.topic.retryConfig.trackingId; // clean up
  return msg;
}

The next message has different trackingId. I'm not 100% there won't be race condition, but at least it shouldn't be on different trackingIds?

Last week, thanks to community feedback in this discussion, extensive debugging, attempt at understanding documentation and a deep dive into source code, it became apparent that modbus flex getter by design doesn't always produce empty message on fail. Further, it doesn't always throw an error. So no way to detect or catch exceptions without adding extra logic around.

Today, I noticed something else baffling about the modbus flex getter. Steps to reproduce:

  1. Add config node (as normal).
  2. Add flex-getter (as normal) using the config node.
  3. All works ok.
  4. Add a 2nd flex-getter, also using the same config node.
  5. Disable it.
    Result: Now the 1st modbux flex-getter is dead!

It turns out when you disable/remove one flex-getter, it also disables the config node. Even if used by another flex-getter! The shared config node is dead. This of course is not visible anywhere in the editor. You don't even receive output. Normally this won't happen as we only use one flex-getter per config node, but it happened during debugging/development/testing.

So now we have to be super careful when adding, removing, disabling or configureing flex-getters?!?

Are you using Serial or TCP modbus?

1 Like

TCP modbus yeah. Or rather, TCP to a dongle converting to serial RTU. We have another device with modbus TCP and that runs flawless.

I still don't understand the Optionals configurations:


Is Show/Log a matter of stdout/stderr and NR debug sidepanel? Which takes precedence, the node config or the flex getter?

  • Show Activities: Displays node status
  • Show Warning: ?
  • Show Errors: ?
  • Log failures: ?
  • Log states changes: ?

in the past I've been caught out with message collisions and contention for the serial port on serial modbus, largely due to the relatively slow speed, and having logic spread across multiple threads/flows. The same thing would have worked flawlessly over modbus TCP.

The solution was to carefully sequence requests so I didn't ever send a request until the previous response was either received or timed out.

Just saying, it's something work checking. I understand your gear is old and maybe less reliable, and you do need it to handle errors gracefully, but generally RS485 is pretty solid, especially at lower baud rates.