Deploy takes very long - sometimes as much as 50 mins

Hi,

We are facing a very strange issue. In our case any deploy (modified / node level deploy) takes a long (20 to 50 mins). The deploy finishes relatively quick - few mins - but to get back the editor control takes time. We enabled the trace logging level. But unable to understand where the time is being spent. Seems like some core process and not something we implemented in our flows (not sure if my assessment is correct).

As can be seen in below pic - process took 30 mins before control returned to editor.

Till 20:00:04 loading of various flows was going on. And then it reach some runtime event at 20:29

If anyone can suggest where additional we need to check, it will help a lot.

What device are you running NR on?
How much memory does it have?
What else is running on it?
How big is your flow?
What version on NR and node.js?
What does your flows do?

Thanks Zen for responding. Here are the details :

What device are you running NR on? --> Windows Server 2016
How much memory does it have? --> 64 GB
What else is running on it? --> Other in house applications running . I can stop some of them if we want to run some test.
How big is your flow? --> Flow file is around 3 MB
What version on NR and node.js? --> 2.1.4 , Node JS is v14.17.3

What does your flows do? The flows are quite complicate and do multiple things - but none of the complicated flows are "inject at system start". They are all manually separately triggered. Even in the picture- i cliecked a simple flow (inject + debug ) - 10 mins post deploy click. Still the result was visible a good 30 min later.

Are you accessing databases, getting data from sensors or accessing the internet to get information?

How many connections to outside (of NR) do you have.

So far you still haven’t said what the flows are doing so it’s hard to give any advice.

How many tabs do you have in the flows?

This is not normal at all. Deploy of flows should take seconds not multiple minutes.

What kind of deploy do you generally do? Full deploy/flow deploy/node deploy?

Is the server maxed out (i.e. what is the CPU like before, during & after deploy)?

Are you hitting virtual memory?

NODEJS 14 is end of life & there have been numerous improvements to node-red since v2.1.4 - you might consider an upgrade to v3.0.2

Also check in the system monitor whether the processor is being heavily utilised, either by node red or something else.

Are you using the dashboard? If so is there anything that is updated faster than every few seconds?

If you restart node red, immediately make a small change, and redeploy, does it still take a long time?

Have you got the deploy button set to Full or Modified Nodes only?

You could als connect with a debugger (e.g. Chrome developer tools) to your Nodejs instance, and then do a cpu profiling. Then you see exactly where it is spending all his (cpu) time.
Bart

Hi All ,

Please see below -

Yes. We have local DB on the same machine which we access. No access to internet.

Not clear on query.

  • The flows do multiple things connect to DB , file movement. update DB etc.
  • We have around 30 tabs. And 30 separate subflows. But the point here is that none of them starts on deploy.

Mostly modified flows , modified nodes - in this example - it was modified nodes

Let me check this.

Will do.
Checked this system is well around 50% CPU and 50% memory. So doubt if its resources.

Not that i am aware off. Will recheck.

The test for this post was - move 1 node. So yes.

But does anyone know what happens between - flushing the trace of loading and "runtime event".
I am trying to understand if its something at NR core level or something as part of auto-scheduled flows.
Cos the logis not showing anything in these 30 mins.. All my flows have some or other debugs.

For a multi core processor that may mean that one or more cores are flat out, so my guess is that is the problem. How much cpu is node red using?

Agree with Colin.

Node JS runs on single core. If that single core is maxed out, this is your issue.

If I were to guess, then something in your flows is maxing out the CPU core it's running on. Probably a loop of some kind of perhaps you have something running upon deploy (e.g. Do any of your function nodes have code in "on start" or "on close"?)

Without having access to your flows it is mostly guess work however, you can make strides yourself. I would start by disabling various flow tabs until you get something sensible. And to save you waiting half an hour between deploys, I would do this in reverse. What I mean by that is start note red in safe mode, disable all but one tab, then deploy. If the deploy is snappy enable another tab, deploy and so on until it becomes unresponsive.

This is what i have started to do. Please note 50% CPU i took from Windows Task Manager.
(Other non Node Red processes continue to run on this server).

Btw - anyone knows how subflows impact the flow loading / deployment ? Cos we do have quite a few subflow calls.

Another thing you could try is disabling all but one of your tabs and see if that speeds things up. If it deploys quickly then enable 20 more tabs. If the problem shows up you know it is somewhere in those ten so you can eliminate them bit by bit till you find the issue.

Divide and Conquer!

Doing that.
Started with all tabs disbabled.
Life looked good. Deploy was almost instantaneous.
Started enabling them 1 by 1.

I finally started seeing a pattern. If the tab being enabled had a subflow call - that started adding to the deploy time. The initial tabs just had 1 subflow or few subflow calls (less than 5).

On the last tab enabled - the deploy time jumped from few seconds to over 5 mins. And that tab has around 60 injects each calling a subflow. (Many of these injects call the same subflow - but with different params)
(Each inject is connected to its own subflow to avoid instances of subflow interfering from each other)

It seems if we have more and more subflow calls - it impacts the deploy time. I am not yet clear why should it. Is this a known behaviour ? Anyone knows ?

1 Like

Some background: If you have a subflow that contains 100 nodes, then you add 100 instances, this expands to 100 * 100 (10000) additional nodes to setup, destroy, re-create when a deploy happens. If each of those subflows have initialisation code - that has to run for every one of them.

If your subflows have no state (i.e. they are PURE, dont use context, have no side-effects), then you will save a LOT of resources using link-calls instead of subflows.

It almost certainly isn't the fact that they are subflows, it is what you are doing in the subflow or what is coming out of the subflow.

This is what i think is happening. I have simple flows and complicated flows. Many of the flows were created before link call feature was available. I also need to check if indeed there are no STATEs. Mostly i dont have STATEs - as i prefer not to work with NR Flow/Node level contexts. DB or global contexts - which i try to avoid overlap. But need to check.

The challenge with link calls / link in etc. is some of subflows have parameters. And unfortunately today we dont have such mechanism . So significant portions of "re-write" may come into play. Lets see.

(Btw not related to above issue - but connected to linked call - and only because you answered this Steve - do you know if we ever move ahead with Link Call enhancements (timeout optional , pass through mechanism etc.) ?

Yes - but i would not expect it to impact "deploy" . For sure execution. Deploy should not care what goes in and what comes out. As that stage there is no message in the system.

and this gets worse if your subflows nest subflows.

each instance creates all its internal nodes and if any of those are subflow instances, they are created too. It really doesnt take much to get to huge numbers

Example:

  • Subflow1: 10 regular nodes, 5 Subflow2 nodes
  • Subflow2: 20 regular nodes
  • Main flows: use 100 Subflow1s and 100 regular nodes

this means 100 + (100 * (10 + (5 * 20))) == 11100 nodes to be created, setup, initialised etc etc.


Contrast that with link-calls (aka pure functions):

  • Subroutine1: 1 link-in, 10 regular nodes, 5 link calls to Subroutine2, 1 link-return
  • Subroutine2: 1 link-in, 20 regular nodes, 1 link-return
  • Main flows: use 100 link calls to Subroutine1 and 100 regular nodes

this means 100 + 100 + 2 + 20 + 2 + 10 + 5 == 239 nodes

its on the TODO - thats all I can say right now. Watch out for NR v4 is probably realistic.

This was clear once you wrote the earlier note. As i said i need to check if indeed i can move entire subflow to link calls. I have some subflows which take parameters.

Noted. Thanks for replying. No worries.