Guaranteed Delivery flow with fileStorage gets stuck often

Refer to this thread

Using the below flow for guaranteed delivery with FileStore backing. Over time, it gets stuck with state being "waitingForOKFail" and was never able to recover from this. This happens for no reason that I couldn't triangulate - there is no mqtt server failure.

Anyone has faced this issue before - any likely way out of this. Can this be because of file lock issue or something like that?

image

It should not be able to get stuck. What is the mqtt complete node?
You could install the flogger node and log the messages and see how it gets stuck.
In the meantime you can manually release it by injecting a Fail message with an inject node.

If course it is a Complete node attached to the mqtt node.

I have added a floggernode to track injecting message. But not sure how this gets. After (re)deploying few times it got stuck again.

Not sure how to inject a Fail message - could you please elaborate? msg.control="FAIL" or something like that?

Use the flogger node to show what is happening at multiple places in the flow, into mqtt, out of the complete node, out of the trigger node. You don't need it at the front where you have it.

The same message that comes out of the FAIL change node. In fact I think you can just inject anything into the front of the fail node.

Thanks Colin. It did release all the messages when I injected some timestamp. But I have no clue as to how that happened.
Is it because it was waiting for either Ok or Fail - and in this case Fail (control=fail) helped move it? I saw it so? :slight_smile:
Awesome.
I used the flogger to backup the messages. I will try adding another logger purely for debug to find out why it is getting stuck.
Thanks a bunch.

Send them into one flogger so you can see the sequence.

Do you actually understand how the flow is supposed to work?

Wanted the data backup in one log file and all debug in another file so that it can be handled separately and easily with different policies.

Yes. A while back I had looked at it - I forgot some of the details. Again I looked at the subflow template (state engine) yday to understand how triggering a fail message would process the pending ones.

Triggering a fail tells it the the previous one failed, so it tries it again, after the timeout you specified.

I am not sure what you are saying about flogger. From my point of view the purpose is to record what is happening so that you can look back and see why it got stuck. Can you export and post here the flow please, the nodes around the subflow. I want to make sure it is correct.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.