How to make Dashboard webserver more robust?

My users keep getting "Connection lost" message in the top right corner of the dashboard and shortly thereafter the site refreshes on them.

image

Node-RED Dashboard is here: https://acars.adsbexchange.com/

I have moved my site between data centers, the latter is the most robust that has hosted the site and I am still seeing the error.

Seeing around 720 Gb of upload traffic every 24 hours.

Is there any way I can find out how many people are connected at any one time?
Whats the webserver that the dashboard uses?
Are there any settings I can try and change to get it to hang in a bit longer rather than throwing up the error?

EDIT: Running on a VM. Ubuntu 20.04 server (headless)
Intel Xeon E5-2697 @ 2.6GHz quad core.
32Gb RAM
500 GB SSD.

PM2 stats:
image

Thanks.

Ben.

Ben,

slick site! I am actually having 0 issues with two browsers pointed at your site and running a search in each.. its working fine as far as I can tell... maybe proximity? whats your hosts physical location??

  • Sean

Thanks for checking it out Sean.
The problem is not consistent. ie, every 30 seconds or such. It's about every hour. Sometimes 4 times an hour, some times none, sometimes 8 or more times an hour.
Just hang out and watch the live data on any given page and you will see it.
Does not matter what page you are on as all pages refresh when it happens.

I will keep an eye it for a while.

Ben,

Can you run a netdata agent on you machine? in a docker or a baremetal install takes a couple seconds and you can get super fine grained status on your machine, services and network

I have used netdata in the past on my own machines so am somewhat familiar with it and its features.

My site is in a data center and I am not excited about asking the guys about opening another port to see the netdata webserver.

I'd love to know more about whats happening with the Node-RED dashboard webserver more so than the VM its running on.

Edit. Using btop to monitor the VM. I get enough broad info from it to see the machine is not stressed.
(Profile two screen shot, but I use Profile 0 to see the processes as well as everything in this screenshot).

I have now tired the following:
btop, nethogs, iftop and iptraf.
They all give the same information as btop network.

Still not finding any answer to what webserver Node-RED is using for its dashboard nodes.

Still not finding a way to see how many connections there are the website.

It uses the same express instance as the core of node-red. It just adds a route to it.

Ben,

I understand.. I have been on it all day. and I have not seen any issues or disconnection at all.

I sort of seems like a cors issue possibly.

does your browser console show anything when you see the error??

I just visited the site (21:00 GMT, Firefox on Android).
Every option I picked from the main menu gave me Connection lost, followed straight away by the page reloading ok

Ah! Ok, I had, in all this time using Node-RED, never heard of express.
Thanks dceejay!

Did some google foo, and I know its not accurate, but it will give me a ball park figure, as to how many 'people' are connected.
The nodejs code for finding out was beyond me to implement, so I'm doing a quick and dirty exec with the command lsof -i tcp:1880
Put that into a split node, then join node, then get the length of the payload and plot that on a graph.

Note to self: I think the dashboard uses websockets after the inital connect and load, so how do I track that data?

Right now its building up data on the top graph of the 'System' page (bottom of the menu).
I hope to see if there is a correlation between the number of connections and the Connection lost message. (Honestly, the error popup is not a problem, its the full site reload that follows it thats killing the fun).

There seems to be some sort of 'race' condition that can really upset things.

I just got 3-4 connection lost popups and site reloads.
You can see the server CPU, while not at 100%, sure get busy when that happens. Very different to the usual pattern that has been going most of the day.

I can not see anything different show up in the processes at the time, just node-red uses more horse power.

The Node-RED graph also shows the CPU (Blue trace) bump (Using the 'os' nodes).

I wonder if there is a way to have Node-RED log the Connection error?

Node-RED uses two ExpressJS web servers. One for the Admin interfaces (Editor and various APIs) and one for the user-facing ones (Dashboard, http-in, etc). Other custom nodes can attach to these as well and piggy-back off them. For example, uibuilder uses both but lets you define a 3rd ExpressJS instance if you prefer using your own custom settings.

I believe, like uibuilder, it uses Socket.IO which is a lot more than just websockets. But in principle, yes. most of the communications is done over websockets rather than http.

Realistically, the only way to monitor websocket traffic from your users to the website is to use a proxy. However, you can monitor it for your own connection simply by looking in the network tab of your browser's dev tools. Find the line with the status code 101 and click on it. You will see loads of details and can monitor the ongoing packet transfers.

That tab will also show you if the websocket connection is failing or falling back to long-polling. With uibuilder, you can turn on client debugging to see all the gory details of what is happening but I don't think that Dashboard has that.


My first thought on your issues is that it is an intermittent timing issue - e.g. a network issue. Which may be why it only shows up for some people and not everyone.

One other thing to watch out for is large data transfers. When Socket.IO upgraded to v4, they reduced the default message size a lot. A typical symptom of constant disconnects is messages that are too large. If Node-RED allows you to change the options for Socket.IO, there are a couple of settings you can play with that change either the default timeout and/or the default max message size.

1 Like

Thank-you @TotallyInformation for that very solid and helpful reply.
I am 98% sure that not everyone sees the same error at the same time which is partly why I feel I am chasing ghosts in the machine. But, that said, enough people have mentioned it that I want to dig a little deeper.

@dceejay Could you please nudge me in the direction of finding the Node-RED Socket IO file so I could take a look at its default settings?

I don't think we do anything special at all. One end is setup in ui.js - line 341, and the other in src/services/events.js - lines 10,13

This is from the uibuilder v5 settings.js uibuilder property:

        /** Optional: Socket.IO Server options
         * See https://socket.io/docs/v4/server-options/
         * Note that the `path` property will be ignored, it is set by uibuilder itself.
         * You can set anything else though you might break uibuilder unless you know what you are doing.
         * @type {Object}
         */
        socketOptions: {
            // Make the default buffer larger (default=1MB)
            maxHttpBufferSize: 1e8 // 100 MB
        },

Only covers the buffer size. Of course, I don't know if this IS your issue. Also links to the socket.io docs.

Looking at developer tools trying to get a feel for the size of my site, it looks as if the entire site is loaded each time? Not just the page you are looking at?

Yes, that's how Dashboard works. It creates a single-page web app. The tabs are not pages.

I'm only doing 1/1000 th of what you have, but have the same issue with my home automation where 30%+/- of the time, I get a lost connection and within 5-10 seconds it resolves itself. I have not touched the RPi server in over 3 years but did get new Android phones and it seems the problem started there.

Again nothing of the scope of your system.

If you are talking about access from a mobile device, that is very different to a desktop.

Mobile devices have very aggressive power management and are much more likely to shut down things that they think might not be needed for a bit. I would certainly expect to see more disconnections on mobile than on desktop. Of course, modern desktops also use power management as do modern browsers so it isn't unexpected to see a browser tab loosing connection and then waking back up.