Node-red-contrib-actionflows and lost messages

I have an actionflow with one action in a node red docker container - I throw in 600 http requests as fast as possible and that shows 600 actions being processed.

Those requests then get sent to a second node red docker container, but this has an actionflow with two actions.

The logging from node.log/debug/warn messages is being redirected to stdout and visible to the docker host. I have a guid in each request so can correlated all log entries to that guid. I have changed the actionflows.js to log that guid in it's sequencing code.

In the second container I am not seeing all messages go to all nodes of each of the two actions. If I remove the second action, all is OK, but when I add back in the second action, I don't get all log messages.

I have added a ton of extra logging to the actionflows to see if I can track down the issue. From what I can see the msg._af[config.id].index seems to go backwards from 2 to 1 at some point and then the code will execute the 'Move on' code

        }else{
          // Move on
          if (msg._af["stack"].length == 0) {
            delete msg._af;
          }
          node.status({});
          node.send(msg);
        }

It's a global, non looping action - basic - and the nodes are not doing much - just doing some validation of a JSON object.

I know there's a lot of stuff going on here, but it is fine if I don't stress it, so not sure if it's a node red, actionflows or a docker redirection issue with lost log data.

Anyone any ideas??

Hmm. Is there are simple flow to reproduce this issue that you can share and I can run locally? This would be the best way for me to reproduce and possibly nail down the problem or find a workaround. Much has changed since ActionFlows' initial release now that Node-RED does async. vs sync.; though this should not be an issue.

1 Like

@Steveorevo
I hope I have managed to simplify this sufficiently and still recreate the problem. I have managed to reproduce it with two actionflows_in followed by a simple time wasting function node in between, that does something meaningless with the payload, before the _out node. It's all fake data. It slows the total execution time of each action to approx 250ms per action on this machine). (8 core i7 mac)

I can't upload attachments here, so I have uploaded them to drive and will PM you the link. Attachments are:

The StressTest.jmx is a JMeter test, which simply uses 10 threads to fire off 60 requests each to the inbound endpoint - note that it is for a docker container on port 1890. There should therefore be 600 of everything in the output. The test generates a guid for the data which is conveyed in the header and used in the logging of the test.

FlowStats image shows the stats for the processing run I did showing the

  • execution cycle time message (600)
  • Log message from function node of first action_in (600)
  • Log message from actionflows.js showing execution sequence message for first action (600)
  • Log message from function node of second action_in (243)
  • Log message from actionflows.js showing execution sequence message for second action (243)

flows-stdout.log is the redirected log data - I am confident that the redirection is not 'losing' logs, as I have also tested by modifying the payload itself to reflect actions done and that matches the logs.

ActionFlow.json is the simplified exported NodeRed flow, which I hope will be enough for you to import and run locally. It contains the http in (/common_test) and then the action node and the response. It also includes the two actionflows_in flows.

The actionflows.js is the original code with a couple of modified lines to log the correlation id and timestamp of the log messages for execution time and sequencing (I am ingesting these logs into Splunk for analysis, so want the time when Date.now() is called in ms, rather than the Node Red 1 second granular log time).

If you are familiar with jmeter then you can see the payload in the definition.

Through my tests, I found that the else block at line 86 of actionflows.js was triggered and the stack length was 0. What I also noticed was that there was a green 'Running' image under the action node at the end of the test - not sure if that happened 100% of the time though.

Hope this gives you sufficient to go on.

@Steveorevo

I added a small extra piece of code so that in the first function node of the first action_in, I set msg.count = 1. In the second function node, I set msg.count++; I then also logged msg.count in the actionflows.js in those two log messages.

if the first action is executed only, then count=1
if both actions are executed then count=2

In this pic, for the execution cycle logs, there are 434 count=1, 161 count=2 and 5 count=3!! - 600 in total and all unique correlation ids.

In the sequencing, called before the _in flow is called, count is undefined 600 times before the first _in, and then 1 before 166 of the second _in and those strange 5 called again where count=2.

and another obsevation, if I add this setting to the settings

runtimeSyncDelivery: true,

then I see fewer messages completing both actions, but I have not seen the count value as 3. In all the async cases, I saw over 150 messages doing the second action, whereas in two tests I ran, the sync runs were 89 and 90 runs of action 2.