No-Response / Disconnect of DB2

Hi,
I'm currently migrating my flows from DB1 to DB2 and mostly using UI-Templates.

Situation:

  • Node-Red is running on docker environment

  • I primarily use ZIGBEE to control my lights

  • the automation flows work seemless

  • without any Dashboard UI open, the average CPU load is less than 2%

  • when opening the UI-Dashboard or when switching between pages or when Reload (F5) there are long delays in displaying the UI (page is unresponsive). The CPU load spikes up-to 150%-220% for about 5-20sec
    image

  • occacionally I get the below LOG error messages when the page is not responsive
    [info] [ui-base:DB2] Disconnected it7ep0xJqIUNtSGrAAAZ due to ping timeout
    [info] [ui-base:DB2] Disconnected SWU8wfVszeD-VyrDAAAP due to transport error

  • once the new page is loaded all the UI information are displayed and updated in realtime without any UI-errors (I checked debug-console)

  • I had similar pages in DB1 but never any delays or non-reponsive pages

Does anybody have similar issues with Page-Load and unresponsive Pages in DB2? .. Any Idea on what the issue could be.. or even how it can be solved ?

Thank you for your input and ideas

Are you running both dashboards together, or is it this a new system?

What hardware/OS are you running?

If you run top or similar what processes are consuming the processor?

Are you running the browser on the system running node-red or on a separate PC?

Hello Colin,

I'm running a

  • Ubuntu Linux system 8GB Ram,1.8Ghz 8-core as server only
  • docker with portainer
  • NR 4.0.2 with DB2 in one docker container
  • the overal CPU load is never reported over 20%
  • the shown CPU load (150%-220%) was for the NR container (taken from Portainer for NR container)
  • the hey processes are

  • I checked the browser from 2 other systems (Windows and Apple) and they have the same behaviour

As said, the issue is only when loading/reloading a page ... than the entire page is frozen .. other bowser-tabs are working

That doesn't make sense to me, do you understand what it means?

Also I asked

It is reporting an aggregated value. It means that >1 processor core is in use.

What tool are you using for showing that?

How long is long?

If you view the dashboard from two different PCs and refresh the page on one of them, does the other lock up too?

if you create a new page with just, for example, a text node on it do you still see a delay when switching to that page?

Edit: Another thought, are you using ui-control or ui-event nodes to trigger actions?

Hi Colin,

  • The overall CPU of 20% is checked with TOP comand and verified with WEBMIN tools

  • In regards of unresponsive UI I have to be more specific

  • --> the new page is showing in initial state (as per design state with no data (below is for a ui-chart)
    image

  • --> sometime the Chrome Unresponsive message is popping up
    image

  • --> Even the Main-button (top-left) is not accessible
    image

  • --> after some time (usually 5-30sec) the UI-data is updated
    image

  • any other automation/flow without UI-component does not show any imapct/delay

  • --> motion sensor is switching on light

  • --> log-files are sent to Telegram

  • I'm not using ui-event or ui-control

  • when the page is fully loaded the information are updated/displayed in real-time and all controls are working fine

  • It looks worse with pages having more ui-templates (with vue elements)

  • --> this loading mostly OK

  • --> this is usually having issues (not always)


  • as mentioned earlier , there are no error messages in the console-log for any of these pages or ui-elements

  • simple pages are loading fine
    image

Some additional info:

  • I'm running another NR 4.0.2 instance with DB1 in container on same machine (same flows / similar UI based on DB1) .... without any disconnects or delay issues .. DB2 is only difference
  • I had deactivated this NR/DB1 for testing purposes with no impact (even full server restart ..just in case)
  • all my servers are running on 1GB unifi backbone network
  • I tried with wired and wifi based browser and behaviour is the same

It looks like there is a bottleneck somewhere when a new DB2 page is loaded .. especially with multiple ui-templates on it ... but I will further check

Update:

  • i have the Db2 dashboard loaded in 2 browser tabs on same machine ..reloading/changing the page in the 1st tab does not impact the dashboard in 2nd tab
  • same for browsers on 2 different machine ...no impact to others when a Db2 page is reloaded

Attached is my ui-template to control my lights
Ui-template_for_lights.json (109.0 KB)

With that chart, at what rate are new points being added to the chart? There are known issues with the performance of the chart node, It is going to be reworked I think.

Hi Colin,

the issue is only with initial load or on re-load .. this is not only for ui-chart but for other db2 elements (especially ui-template) as well.

for ui-chart im gathering 1 record every 2 seconds .. hold over 2 days max. I used that example to show the difference between initial state and after full-load (5-20sec)

I use ui-template the most and thats where I see the issue/delay/unresponiveness

Are you able to share the code?

Also, can you share a screenshot of your browser's network tab after the page loads? Would be interesting to see if it is a specific resource request that is causing this.

This is the code of my ui-template for managing the lights .. I have at least 10 of em in 1 page
Ui-template_for_lights.json (109.0 KB)

Network tab as asked


I tried to make sense of your flow but I'm afraid it escapes me. Partly because there are several node types I don't have and partly because the 1 D2 output I can see does not actually seem to make it onto my D2 output page so I can't really help understand where the issue might be.

Clearly I don't know D2 well enough.

This has me confused:

Why are the buttons to the right of the Group disabled?

Fron what I've seen with others, I suspect the memory leak in ChartJS for time series data is causing your problems.

We are aware of it, and hoping to switch chart providers in the near future. Massive CPU spike anytime data is injected into the chart is what we have seen consistently

The buttons are disabled by default in initial state .. they will be enabled/disabled based on the dynamic data (e.g. from the init data in the flow) provided into the ui-template

Hi Joe,

yes I'm aware of the memory leak issue for charts, which is a secondary issue for me here as I used the chart as a sample to show the behaviour.

My primary issue is with the use of multiple (8+) ui-templates with vue elements in one page as these pages often becomes unresponsive for 5-30 sec until all the data is loaded. I did provide a flow sample earlier already (see details above)

Thanks, I can take a proper look on Tuesday. I have pages with dozens of Templates on that render fine, so suspect there will be specific problem here, rather than a generalised template problem.

Hi Joe,

based on info from Julian, I monitored the console-log for network more closely and seen that the websocket.io is pending for a long time


console log

Could there be any issue with my websocket config or network setup for Node-Red. As said my system is

  • Ubuntu Linux system 8GB Ram,1.8Ghz 8-core as server only
  • docker with portainer
  • NR 4.0.2 with DB2 in one docker container
  • Unifi backend system with 1GB
  • Servers are wired, desktop with browser is wired

Other observation:
over time (~2 days) the system becomes worse with the UI and I need to restart the docker container

Hi @xx_Nexus_xx,

You can also do some CPU profiling of the Node-RED server-side instance in your Docker container, to find the root cause.

There might be better ways to do it, but you could do it like this:

  1. Follow the Docker specific steps for my inspector node.
    BTW I don't use Docker myself...
  2. Import the example flow from that same readme page, so you can enable debug mode of the NodeJs application server running in your Docker container.
  3. Connect with your Chrome developer tools to your server-side NodeJs instance, and start a CPU profiling (see wiki page).
  4. Have a look at e.g. the CPU flame charts, and hopefully you find that way which code snippet is taking so long.

Good luck!

1 Like

thx .. I will give it a try

1 Like

Does this even make sense ?

data:image/jpg;base64, https:..... pointing to a png

base64 indicates that a base64 string should follow.