not freeing up TCP sockets correctly? (UIBUILDER context)

I am trying to isolate/resolve socket resource consumption. The problem: The daily reboot of Android panels result in gradual socket consumption until the server (RPi) exhausts it's socket pool at around 500 sockets (default MAX SOCKETS count for the OS).

The environment: NodeRED/ExpressJS servers are Raspberry PIs. User-facing front ends are a number of Android panels with kiosk-ed browsers running a Vue2 app in UIBUILDER. For operational reasons the Android panels are configured to reboot daily at 4am, and have approx 50sec reboot time. I'm running several of these setups across multiple sites resulting in some variance in node versions, but seeing the same issue on all instances. All running NodeRED v3.0.2 with Node16. Node versions in use (according to npm ls) include:

v6.4.0     |  v4.6.1
v6.4.1     |  v4.6.1
v6.5.0     |  v4.7.2
v6.6.0     |  v4.7.2  (This unit not yet in "Production" or part of these tests, but will be by the end of this week.)

Android panels are running a mix of Android 8/10/12 with no correlation between Android version and socket consumption. I'm primarily using one brand of Android panel, but I'm seeing the same issue with another brand. I'm gradually updating all panels of my primary brand to Android 12.

Restarting the NodeRED service is required to free up system resources. Prior to restarting I use netstat to observe current socket resource consumption. There are multiple sockets per Android panel lingering in the ESTABLISHED state. appears to create a surplus socket (or possibly more) at initial connection, and fails to clean up the socket when each panel goes through a 4am reboot. (Comments on the interwebs/elsewhere regarding Vue "use strict" make me a bit unsure whether to expect one-vs-two sockets.)

I can accept that perhaps two sockets per panel are needed but the growth gets well beyond this, in some cases up to 50 or more sockets for a single Android panel. The distribution of this issue is somewhat inconsistent - a panel will behave fine for a time, while at other times it consumes multiple sockets. I put the random distribution/freq down to things beyond my control - power/network instabilities. I can recreate the issue on my workbench by power-cycling a test Android panel, running a bare bones UIBUILDER instance templated from the "VueJS v2 & bootstrap-vue" template (to rule out any of my Vue coding).

My first thought was: If I can shorten the socket timeout to something much quicker (< 20 sec) than the panel reboot time (50 sec), old sockets should get cleaned up prior to new ones being established. That would possibly avoid any socket recycling/reuse issues which might be at play. I dug through the following docs to find some settings to control timeout durations.
Server options | Socket.IO
From that, I modified my NodeRED settings.js file adding the following.

    uibuilder: {
        socketOptions: {
            pingInterval: 5000,  //5 sec
            pingTimeout: 4000,  //4 sec

AFAIK, this should theoretically give me a max timeout of 9 seconds before the socket is considered to be timed out. I restarted the NodeRED service to apply the changed settings, and used netstat to observe socket utilisation.

Test 0 - Establish a "normal" state. After powering up the test panel it initially establishes five sockets to the server, then three-of-five quickly enter the FIN_WAIT2 state and ultimately timeout/disappear. The remaining two sockets stay in the ESTABLISHED state. All sockets were/are accessing the server on port 1880. As you read on, I suspect one of these two remaining sockets might already an underlying issue since I believe there should only be one socket between UIBUILDER instance and NodeRED engine.

Test 1 - Simulate a long power outage. I disconnect the (PoE-powered) panel. It takes approx 30 sec for one of the two sockets to change from ESTABLISHED to the FIN_WAIT1 state, with final teardown of this socket occurring another 120 sec after this. I find it curious that the timeout happened after 30 sec, not 9 sec per my config - there's something for me to learn there... The other socket of the Test #0 pair remains in the ESTABLISHED state indefinitely (> 12+ min beyond the first socket's teardown), even though the panel is not powered up. I think this demonstrates the core issue I'm having - I expect this second socket to timeout at some point but it never does, and can only be freed up var a restart of the NodeRED service.

Test 2 - Simulate power restore. After panel bootup it re-establishes again per the five sockets, a quick timeout of three sockets, stabilising to two ESTABLISHED sockets. Looking at the unique/semi-random client port# of the sockets, one of these two is the original socket from Test #0 which never timed out in Test #1.

Test 3 - Simulate 4am reboot. For this test I did a quick disconnect/reconnect of power to the panel, looking for possible race conditions between socket timeout and panel boot time. Socket timeout behaviour is per Test #1 above. I think I might be teetering around a timeout threshold value, though this observation could be unfounded correlation. If the Android panel attempts to connect to the RPi while the previous socket is in FIN_WAIT1 state on the RPi side, the initial cluster of five sockets drops back to three instead of two. (Spawning another lingering socket?) These three sockets include the original ESTABLISHED socket from Tests #0 and two new sockets from the "cluster of five" in Test #3. The FIN_WAIT1 socket established in Test #2 times out.

Test 4 - Repeat of Test #3 above. While the Android panel was rebooting, one RPi socket entered FIN_WAIT1 state while the remaining sockets stayed in ESTABLISHED state. Post-reboot we settle to five ESTABLISHED sockets, so a growth of two extra sockets going from Test #3 to Test #4.

As a side-test, I was a bit frustrated that that my settings.js=>UIBUILDER=>socketOptions timeouts seemed to be ignored, so I tried to find/modify the default behavior of UIBUILDERs library. All modifications were temporary, and have since been restored to default settings.
~/.node-red/node_modules/node-red-contrib-UIBUILDER/nodes/libs/socket.js line 216 maxDisconnectionDuration. Changed this from 120 seconds to 12 seconds (12000 mSec)
This did not appear to have any impact on the test results.

Other general observations:

  • I don't think the presence of a socket in FIN_WAIT1 state is influencing subsequent socket creations, since Test #0 had no such FIN_WAIT1 sockets when it created a "faulty" socket.
  • I don't see any obvious performance degradation on the panel when the RPi is alleging multiple socket connections to it - I suspect the issue is contained to the RPi side and there are probably not 50+ sockets connected end-to-end between server and client.
  • I'm not seeing message duplication between my Vue app and the NodeRED flow, so I think such "duplicate" sockets are not replicating/passing data. They are simply consuming system resources (due to my daily 4am Android reboot) and remaining in the ESTABLISHED state on the RPi side until we reach a critical OS resource limit.

Where to from here? I'm currently mitigating this by monitoring the total socket count via netstat -n | grep tcp | wc -l, flagging this when it gets above 350 sockets and arranging an out-of-hours restart of NodeRED service on an as needed basis. I'd prefer to get the root cause resolved than set up an automated service restart and ignore the root cause.


Interesting and you have done some great, detailed analysis so far. Also a little worrying.

To be honest, I've kind of assumed that does the "right thing" with its connections and I've not really done anything on the server side to tune anything. I was more focused on ensuring that each uibuilder node gets a separate channel to prevent any cross-talk. On the client side, I don't really do any tuning either though I do make use of some of's built-in security metadata to exchange some additional data. The only other thing on the client is some auto-reconnect processing which is really necessary on modern devices - both mobile and desktop/laptop - to cater for devices going to sleep and waking up or for other transient network issues.

Most of the other testing I've done is around trying to make sure that there are no memory leaks - which I've never found with uibuilder so far.

Of course, as you've already seen, I do expose's configuration and middleware so that you can tune things yourself (so I thought anyway) and layer in additional processing or debugging as needed.

Doing a quick test on my own home automation server (running Debian on an old laptop). I think I'm seeing something similar. In that before opening any uibuilder web pages, I can only see 2 1880 connections which is what I'd expect for having the Editor open. But on opening a single page, I immediately see 12 but only 3 are actually established which is understandable. But after a short while, this drops back to 2 (with the open web page in the background). Reloading the page results in 5 connections again, with only 3 actually established. This drops back to 2 again after perhaps a minute or so. The un-established connections are either FIN_WAIT2 or TIME_WAIT.

To be honest, I'm no great expert at what all these things mean. However, what I can see in the browser is that the web page is making around 6 related calls. Of which, 5 are polling calls and the other is the ws connection upgrade (which remains open while the socket is open). Sending messages to the web page after the connections have settled back to 2 results in a single established connection which is to be expected.

This is all on a page not running VueJS.

I think the next things to do would be to test using a none-VueJS instance, just the bare bones "Blank" template. To see if that exhibits the same issues. And then possibly compare this against an open Dashboard page (which also uses Socket.IO).

Again, on my server, making sure no other pages were open and waiting for the connections to drop back down to 2 (1 established and 1 time_wait), I opened up the dashboard page and immediately saw 8 connections. All but 1 established. After a few seconds, there were 7 connections, 2 established and the others all time_wait. Then a short while later, it dropped back to 2 established sessions.

So it looks like it may be "normal" operation to create multiple connections for a client on initial connection, most of which then drop out after a minute or so. And this would seem to coincide with the way that it does the handshake over http(s) before "upgrading" to ws(s) on the same port.

The thing to test after this would be with non-android browsers and then with alternate android browsers (maybe Edge, Vivaldi or Brave). I seem to be getting the same results on my Android device with Edge as on the desktop.

Also, might be relevant. Which version of Rasbian are you running? and is it up-to-date?

Thanks for the tips. I'm working from home today but will try your suggested tests tomorrow when I am back on site, and will follow up with my findings. (Non-VueJS page via uibuilder.) I don't have Dashboard installed on any of my units, since UIBUILDER is a much better fit for what I'm doing.

In the mean time:

  • Raspbian versions are a mix of Buster and Bullseye, but mostly the latter.
  • Most would have been patched to latest within the last two months. Certainly seeing the issue on some of my newer units.

To make my netstat observations easier to read/understand I ran them through grep. netstat | grep <android.panel.ip> All my socket counts in the Test# comments relate exclusively to the Android sockets and don't include the NodeRED Editor or other devices. Each RPi is controlling a decent amount of Audio Visual hardware in multiple classrooms, so the Android panels are not the only socket to/from the RPi.

Testing non-Android or alternate browsers may be a bit tricky. I understand the purpose from a bug-finding perspective, but I ultimately need kiosk-ed Android browsers to work. Happy to keep digging and if it turns out to be an Android browser issue I can raise a bug report with the manufacturer of my panels. I'm no Android expert but I believe they may wrap an OS service (WebView?) for their kiosk-ed browser.

Curiosity got the better of me while working from home.... I have a vendor-specific management suite which can issue a "reboot" command to the Android touch panels. Soft rebooting the panel is not an identical scenario to what I am testing for, but it does provide a bit more insight.

Test 5 - Remote soft-reboot with a Blank (non-Vue2) template.
Initial connection of the non-Vue panel resulted in one socket connection. Issuing a soft reboot to the panel caused the socket to immediately disappear on the RPi side. I.e. A graceful/controlled teardown of socket resources. (Okay, that's a different symptom to my problem. We'll look into that in the next test...) Once the panel had rebooted, three sockets were established and one of the three sockets quickly entered FIN_WAIT2 state (within 6 sec), then timed out (after 120sec) leaving two ESTABLISHED sockets.

Test 6 - Time based reboot.
Seeing the soft-reboot in Test #5 resulted in a clean teardown of socket, I modified the Android daily reboot time to "a few minutes from now" and observed sockets while the time-based reboot occurred. At the reboot time, both ESTABLISHED sockets from the end of Test #5 were immediately torn down cleanly and disappeared as part of the panel reboot.

Test #7 - More soft reboots with Blank and Vue2 apps.
While the number of "stabilised" sockets varied in both client code scenarios, there is consistent and clean socket teardown when issuing a soft reboot.

So I'm starting to think the root cause is probably not the "planned" 4am daily reboots but more likely "unplanned" events (power outages, intermittent network performance/routing). It appears that the heartbeat is not adequately detecting lack of end-to-end connectivity, where the expected behaviour would be to close the socket on the server side once pingInterval+pingTimeout is exceeded. The docs that I linked to in my first post suggest there are default values for these two engine heartbeat options but my observations suggest that are ignoring any such values, regardless of whether or not UIBUILDER is correctly passing customised values to If customised values were not being passed correctly, I'd expect the default values to be followed. (This specific test was on a v4.6.1 instance, though it likely applies to v4.7.2 too.)


Having slept on it, this is exactly what I was going to suggest to you today. The symptoms being related to the tablet losing connectivity and trying to re-establish.

Something you might consider. Do the tablets need to stay powered on when not in use? Some automation on the tablets might do a controlled shutdown and startup on a timer?

I will try to do some analysis on the server if I can though the innards of it are pretty arcane. I actually would have replaced it but I can't find anything else that can provide a similar level of control. I do want to be able to offer things like shared rooms and things like specific user-rooms that would make it easier to communicate with a specific browser or user.

The other thing might be to look more closely at how the client is behaving. I do have some niggling doubts as to whether the reconnection logic is quite correct. It attempts to do a decaying reconnection - tries rapidly to begin with and then with decreasing frequency. At least that is what it is supposed to do.

Here is another couple of thoughts.

The client does not currently cancel all listeners on disconnect. I wonder if this might be making things worse? I don't think it is but it isn't impossible. In the testing I've done, I can't see that we get any duplicate listeners that might hold onto more connections but if it were to happen, it would most likely be on an unclean disconnection followed by a reconnection.

You may also want to note whether you've created any of your own listeners? I can't think you would but as I don't know exactly what you are doing, it isn't impossible.

If you wanted to try forcing closure of all listeners on disconnect, you could add this to your own code - but it may break stuff. :slight_smile:

uibuilder._socket.on('disconnect', () => {

Obviously, you shouldn't normally use anything that starts with _ but it is there if you need to try drastic things.

Oh, and as I've been thinking things through, there is a possible brute-force approach for working around the problem. At the moment, the client sets the available transports to transports: ['polling', 'websocket'],. You could try reducing that to transports: ['websocket'],. That will make connectivity less reliable on a poor network but it should remove a lot of the extra connections.

If that worked, I could get you a custom version of the client to use while I work up being able to make the settings more flexible.

Thought - Brute force custom client with non-polling websocket
At present the issue is not apparent to my customers, only to me as back-end dev/support. I'd rather not increase the likelihood of users experiencing socket reliability, nor put you through the hassle of rolling a custom client. There is also a minor hassle on my side with managing the rollout of a temporary custom client. Thank you for the offer though - I appreciate it.

I'm back on-site and doing more testing with "unplanned" disconnections.

Test #8 - Blank template with "unplanned" disconnect.
The post-boot socket count peaks at three new sockets, stabilising to either one or two ongoing sockets. During s subsequent hard reboot of the panel, one of these sockets enters FIN_WAIT1 state and eventually disappears. When there was more than one "stable" socket the remaining socket(s) stay in the ESTABLISHED state.

Thought - on Listeners.
Given my previous observations with the Blank and Vue2 templates I think we can rule out any silly coding I may have done. In my Prod HTML I attach listeners to UI elements. E.g. @onclick='myFunction()' In my .js I don't do any complex uibuilder._socket.on() event listening. All listening is with the UIBUILDER object itself - uibuilder.on() I have some code for uibuilder.onChange('ioConnected',()) which I use to show a "Comms to server offline" popup. This popup prevents customers from interacting when there is no end-to-end comms, which I find is less frustrating/confusing for them than interacting with a non-responsive UI. We may be able to leverage this such that on disconnect we could log/queue up a few specific parameter values on the client side, then pump them back to the NodeRED flow at re-connect. I don't know what we would want to log/queue though.

Test #9
For giggles, I added your suggested uibuilder._socket.on('disconnect',()=>{}) code to the basic Vue2 template. Once I had recreated the more-than-one socket scenario I did another hard power cycle to see if the stuck sockets get torn down. Per previous observations, only the real/active socket gets torn down. This makes sense since the client doesn't have an opportunity to execute the removeAllListeners in my context. The panels are PoE powered, so the client no longer exists in the panel once the undeniable lack of electrons occurs. :wink: The removeAllListeners() suggestion may still have benefit in real-world events, though I am unable to verify the usefulness in DEV. Power-cycle of PoE is my only method of introducing an "unplanned" disruption of the connection. From a Production perspective I want/need to lean on PoE where I can rather than rolling out an additional power supply per panel. If I can find a power supply for my Dev panel I will revisit this test. I don't want to push this test out into Prod without testing it in Dev first, especially with your caveat of "it may break stuff". :+1:

Thought - Powering panels off.
I don't think this would help with mitigating the issue, and could introduce further challenges. The controlled reboot (and presumably shutdown) cleans up its currently-active sockets okay via TCP FIN packets, but is not likely to also clean up the already-broken sockets. If it did we would expect to see such cleanup happen each day with the current 4am auto-reboots, which is not the case. Also, I like to monitor hardware 24/7 for potential theft (not a common issue). Having windows of "known offline" gets both complicated and obscures our remote visibility.

Test #10 - Scheduled power off/on.
Regardless of my above thoughts, I did a test by scheduling a clock-based OFF period of 5min, followed by ON. At PowerOff, the "active" sockets immediately closed in a graceful manner. Lingering ESTABLISHED sockets remained and were not cleaned up. Lingering sockets continued to remain after completion of panel reboot and the usual socket stabilisation behaviour. We can rule out managed power Off/On as a mitigation strategy.

I've had a chat with one of our network engineers. We may try to Wireshark some of this (via spanned/mirrored switch port) to see exactly what is going on from both the perspective of the touch panel and the RPi. I'd like to see what the heartbeat is actually doing (or not doing), since it doesn't seem to be following the documented behaviour.

Phew. Thanks for the comprehensive review and testing. In summary then, this appears to be a server issue rather than a specific client issue?

This is something I already had on the backlog as it is very sensible. Not made it to the top of the list yet though I'm afraid. Along with methods to manually dis-/re-connect the sockets.

Yes, I believe so.

I see some posts on the support pages which could be explained by the same issues I'm seeing. Before I jump over there and continue the bug reporting, can you please confirm that I'm looking in the right location for the instance of (server) that UIBUILDER leverages? I believe it to be at: ~/.node-red/node_modules/node-red-contrib-uibuilder/nodes/libs though I also see an instance at ~/.node-red/node-modules/ If I need to try some temporary modifications/CPR to get the heartbeat working properly I want to ensure I'm looking/editing in the right place. I'm no Node/npm expert so understanding how/where everyone includes their dependencies is still a bit of a mystery to me.

The server and a separate installation of the client are installed by the uibuilder install and so should be in the node_modules folder under your userDir - normally ~/.node-red/node_modules


Both are kept at the same version level. The client is pre-built into the uibuilder front-end library whenever a change is made to that. It used to get loaded separately but that often used to cause people confusion so I now use ESBUILD to merge it in at the same time as it compresses things and creates the separate IIFE and ESM versions of the library (and ensures that the front-end code is limited to ES2020).

I've done a WireShark capture and have been able to capture three occurrences of the issue. Due to privacy reasons I won't be sharing the Wireshark log, but I'll try to describe what I see. I might be able to share a screenshot of the high level packet sequence, if need be.

In all three cases it seems that the lingering socket is created by the client. The creation of this socket happens while the client is requesting pages (HTTP GET) after a client reboot and getting "304 Not Modified" responses from the sever. In testing with the Vue2 template code, the client created the lingering socket shortly after requesting /uibuilder/vendor/bootstrap/dist/css/bootstrap.min.css and the subsequent 304 Not Modified response.

The lingering socket is a three-packet occurrence as follows:
Client->Server SYN
Server->Client SYN,ACK
Client->Server ACK
.. and that's it. No heartbeat traffic or anything else on the port.

Immediately after establishing the lingering socket, the Client then does a GET request for /uibuilder/vendor/bootstrap-vue/dist/bootstrap-vue.css and gets a 304 Not Modified response.

Further on in the exchange I can see the establishment of a proper socket and WebSocket heartbeat traffic flows as expected.

So I think it stands to reason that the lingering socket is not timing out in absence of WebSocket heartbeat, because the socket it is not being used for the WS protocol.

The next question is probably: Is the socket being created by, or is it something unrelated? I.e. A rogue process on the client. I'll see if I can perform more traffic captures using alternative-OS clients (Windows, iOS, etc) and compare results. If I see similar occurrences with different clients we can probably be confident the issue is with

Hmm, there shouldn't be any relationship at all between the CSS request and Socket.IO so that is weird.

Are you using the latest Vue 2/bootstrap-vue template as your base? If using an older template, I wonder if something as askew?

I'm not suggesting there is causation between the CSS and errant Socket.IO connection, just correlation. Just trying to indicate when in the overall sequence the issue is being created. Per observation in Production, it seems to be happening more frequently than at controlled panel boot too.

I'm going to set up additional/parallel PING monitoring to see if these panels are spontaneously rebooting. I've seen brief loss of comms between panel and back end (via my client-side "Comms offline" modal/popup), but not spontaneous reboots. Via the management suite I should also be able to check their uptime at the end of the day, and see if it matches the duration for a 4am daily reboot. Let's assume there are no spontaneous reboots unless I say otherwise.

Forgot to mention in my previous post - the "working" socket is showing WebSocket heartbeat a durations which match the UIBUILDER-passed override settings. I.e. Current heartbeat is 5 seconds.
So that's a good thing - config is getting passed to Socket.IO correctly.

On this particular system my current versions are:
UIBUILDER v6.4.1 and related templates.
vue 2.7.14
bootstrap v4.6.2
bootstrap-vue v2.23.1
I'll remove the uibuilder._socket.removeAllListeners() code to bring it back to a "pure template" set of client code.
On Production rooms I guess there could be some potential "mismatch". I used the template as a skeleton and built out my Production code from there. In most instances I'm running the identical set of client-side code. Perhaps I should rebuild that using the latest template as a new skeleton. Hmmm.... Looks like my original skeleton harks from the uibuilder.start(this) era. Okay, that can be a project for next week.

I'm offsite until mid next week, on a much needed break. Please excuse the temporary radio silence.

Yes, possible that might have an impact.

No problem - have a nice break.

Back from my break for a bit, before another 3-week absence over the Christmas/New Year. On reflection during my break, I have observed this issue using the most recent Vue2 template, so the thoughts around my lingering use of the deprecated uibuilder.start() is not the exclusive cause, and may well be unrelated. I am currently downgrading the significance of uibuilder.start() in my thinking, though I will continue to clean up my codebase.

The PING code works well and I now need to correct my previous comments around reboots. A more-correct way to state the behaviour would be: A soft/time-based reboot is no guarantee of non-lingering sockets on the server side. Via a 5 second PING, I am logging constant panel uptime with the only PING outage matching the 4am reboot, yet socket growth continues.

Given my Wireshark observation and the ongoing grown despite generally good uptime, I think my previous comments around ProblemSocket creation still hold true. The ProblemSocket is created at initial connection time and contains no WS heartbeat or other traffic. Outages/reboots simply expose this fact by being a precursor to another initial connection. Given the ProblemSocket is not elevated to WS status and has no heartbeat, there is nothing to tell the server-side to free up the resource.

Truth be told, the non-WS ProblemSocket should probably either:

  • Not occur in the first place, or
  • Clean itself up quickly, once the parallel working WS socket is established.

Does the (custom) client library pool socket resources? Is there a way on the client side to find and initiate a close of all non-WS sockets that it has created?

Hi, The additional connections come from Socket.IO doing an http(s) connection first and then trying to upgrade to websockets. You don't have to do this, you can disable the http connection in and it will try to establish a websocket connection immediately.

However, that is less resilient and it may also preclude some security processing since only http(s) connections can have custom headers (such as JWT, etc).

But nothing stopping you from configuring to only use a ws connection - except I can't remember if I've exposed that in the client - let me check.

Yes it does, it is normally very efficient at this and reuses a single wire connection across many logical connections (e.g. different rooms, etc).

I'm not aware of anything. Other than not opening them in the first place (see above).

OK, looks like, to limit the connection types, you have to make a change to the client options for That is currently not exposed on the client as it hasn't been a requirement up till now.

I need to see if the transport options can be changed early enough to make a difference.

I may be able to drop you a test client library. Are you using the IIFE or the ESM version of the client?

So, I have a test version of the client for you to use. You can select uibuilder.iife.min.js or uibuilder.esm.min.js from below and load it instead of the live version and add this at the top of your index.js or somewhere immediately after loading the library anyway.


And, yes, I realise that I told you you don't need to use uibuilder.start any more but this is one of the exceptions. It will remove the polling transport and restart any existing connection (it should do it correctly so that the server closes the connections on its end but I can't test that).

Hi, and sorry I haven't had a chance to reply earlier. I'm having to balance this against many other work demands. I don't think I'll be able to look at it much (if at all) in the next few months, until I get to the other side of a major project. Don't worry - that project includes an additional few dozen touch panels, so the issue will only get bigger on my side. I'll have to mitigate via service restarts until I have breathing space to focus on the sockets again.