How do you handle disappearing messages? Or do you? Dead letter queue

Found an interesting article:

And turns out this is exactly the problem I faced in so many I/O places (read/write, doesn't matter where from or to, http, mqtt, modbus, mbus, bacnet ++). It describes it elegantly and didn't know this was called dead letter queue.

For example I made subflow wrappers for all I/O tasks, with built-in retry attempts, delays etc.

Some third-party nodes can simply swallow messages on failures! So I also use simple logic to guarantee output (after a set delay). Modbus node can often consume the entire incoming message and replace it with an entirely new message, which makes it impossible to track. Then there's all the hassle of how errors are handled. Some nodes output errors as normal msg (ideally with error attribute), or throws exception that can be caught and some has a 2nd error output. I've landed on all my subflows having 2 outputs, 1 for success (complete/partial) and another for failure (complete).

Also made subflow for more specialized tasks like http requests to write telemetry to thingsboard. With built-in token credential handling and option to forward to backup on failure (what I called dead-letter queue). Didn't even need retry here as dead-letter-queue is effectively the same, retrying at set intervals for limited time (~24 hours):

One issue I currently have is each of these subflows must store a clone of incoming message before going through the retry logic, and many of these subflows are nested in multiple layers, causing clones within clones. So msg/json size grows exponentially. The alternative is to know exactly what properties are consumed and added to msg in all situations from all external nodes, which is hard if not impossible. Sometimes format is needed too, which makes it even harder.

Do you handle failures, retries and dropped messages? And if so, how?

In a similar way. I use the catch node to retrieve errors and, depending on the case:

  • I save them to a local file
  • I attempt a resend
  • I send a message (notification) via Telegram, etc.

There is nothing really special about a dead letter Q. Though the phrase, I think, stems from Message Bus architectures where you have a central "bus", normally some kind of Message Queue type service. These standardise your message flows and responses and so make it easier for the Q service to post things to the DL queue as needed.

Node-RED, as an orchestrator service can play a similar role. However, as you say, your sources and targets are rarely aligned. So each one will need some different form of handling.

Since a DL queue is most useful as a standardised service, the idea would be to have sets of flows that normalise outputs and forward to a single point for display, logging, etc.

Sub-flows, as I think you are discovering, are not always the best tool for this since each subflow instance creates COPIES of the contained nodes which can be very inefficient. In such cases, link nodes are often a better choice.

Unfortunately link nodes are inferior in all other aspects compared to subflows. Having used subflows extensively, here are my findings about link nodes shortcomings:

  1. max (and min?) 1 output
  2. must set timeout, even when not needed
  3. occupies flow real estate
  4. adds additional routing prop which must be preserved
  5. No quick doc from caller (?)
  6. No easy way to distribute updates

NR really really need to move away from link nodes and instead have a option to allow subflows to run as something equivalent to "singleton".

The problem with subflows is as you points out they create copies of instances. And I have multi-layered subflows within subflows, but still haven't had any issues. The cloning on each input in multiple layers have however caused my NR to crash :rofl: