It happened again, but this time I didn't rush to restart Node-RED. Instead, I verified if the data flow in the processes behind Node-RED was interrupted. Upon checking, I found that although the CPU and load on the server hosting Node-RED were high, it was continuously processing data and successfully sending it to the destination. The subsequent programs successfully received the data and wrote it into InfluxDB.
When I checked the monitoring again, I noticed that the number of TCP connections had risen from 400 to over 4,000 in the last four hours. All these new TCP connections were to Kafka. Here, Node-RED acts as a Kafka producer, so I couldn't find any useful information on the Kafka side. Looking back at the historical monitoring, I noticed that each time there was a failure, there was this phenomenon of connection counts skyrocketing to over 4,000.
During the failure, I was able to SSH into the server hosting Node-RED, and from other servers, I could telnet to port 1880. It seems that only the dashboard was unusable. The flows I have in Node-RED are quite simple, serving primarily a forwarding role. The exported files are as follows:
flows202312131652.json (60.1 KB)