Advice needed on Pi CPU usage

Hmmmmm..... Been doing some looking into things..... # of Sockets are increasing slowly as time goes by....

Im gonna let it run till morning, lets see how stacked it is by then.... A restart of NR drops it to around the 400 mark, the little pi is quite busy.... Over a period of about 1/2hr it has increased by around 100 to about 500Skts open....

Now, to find out what's causing it!!

Cya
E

1 Like

Hey guys....

Still chasing my tail trying to find out where my Skt leak is...

I am wondering whether my infantile attempt at writing an async flow has something to do with it... If anyof you great oracles could check this subflow over and tell me if it could possibly be contributing to the problem, I would appreciate it immensely!!

The way this subflow/node thingy works, is as follows:
Any message sent to it, resets a timer (value as set up in the env variables) and is passed through..
Should the timer expire before a message arrives, a topic/value message will be sent instead.(Up to 8 presets can be set via the env variables)
(I am using it on sonoff switches to replace the telemetry message on timeout with a predetermined one for either nullifying a value ie voltage, or sending an alert to tell me that the unit has gone offline)

Please excuse the "less than succinct" coding, I am a bit new to this!!

I hope I don't incur the wrath of the fundi's out here, I have no clue as to how to attach a flow, so I have downloaded it and, well.... Here goes!

Payload Timeout.json (2.9 KB)

TIA
Ed

For the future you can paste flows directly in here. See this post for more details - How to share code or flow json

There doesn't seem to be any TCP work going on in that function. You are looking for something that does something with TCP and maybe doesn't properly clear up. Possibly whatever you are doing with the timeout after it times out, or maybe something else.

I am most certainly NOT an Oracle. I haver never used RED.util.setMessageProperty to create the msg properties so I have no idea what the benefits are. The only other possible issue is the lack of let and/or const when defining variables which may cause problems. But apart from that (and I cannot see why it would cause your problems) as Colin says, the only possible reason this function could cause an issue is if something is affected by the time out delay.

Getting ready to be embarrassed by a lack of understanding. :slightly_smiling_face:

let TimeOut = env.get("TimeOut");       //env.get for SubFlow,flow.get for Function
const TOF = context.get("TOF");         //Current Timer Object if any

if(TimeOut < 0) TimeOut = 0;            //chk neg, Zeroise if neg
TimeOut = TimeOut * 1000;               //to mS (Timeout function is ms based)

let propPath1 = env.get("propPath1");   //Get Replacement Payloads from Setup
let propPath2 = env.get("propPath2");
let propPath3 = env.get("propPath3");
let propPath4 = env.get("propPath4");
let propPath5 = env.get("propPath5");
let propPath6 = env.get("propPath6");
let propPath7 = env.get("propPath7");
let propPath8 = env.get("propPath8");

//************Cancellation Routine (any msg recd)******************
clearTimeout(TOF);              //Scrap the current timer progress if any

// Will never be seen
//let statusMsg = ({ fill:"green", shape:"dot", text:"TOCancel" + " TimeOut:" + TimeOut / 1000});
//node.status(statusMsg)

//******************Cancellation Routine End**************

//********************Main Routine Set Timeout************
context.set("TOF", setTimeout(function(){
    statusMsg = ({ fill:"red", shape:"dot", text:"TimeOut" + TimeOut / 1000});
    node.status(statusMsg);
    
    /* As far as I can tell RED.util.setMessageProperty() can be replaced with msg.property
    if(propPath1 != ""){             //Change it
        RED.util.setMessageProperty(msg, propPath1, env.get("Payload1TOV"), true);
    }
    */
    
    if(propPath1 != "") msg.propPath1 = env.get("Payload1TOV");            //Change it

    if(propPath2 != "") msg.propPath2 = env.get("Payload2TOV");

    if(propPath3 != "") msg.propPath3 = env.get("Payload3TOV");

    if(propPath4 != "") msg.propPath4 = env.get("Payload4TOV");

    if(propPath5 != "") msg.propPath5 = env.get("Payload5TOV");

    if(propPath6 != "") msg.propPath6 = env.get("Payload6TOV");

    if(propPath7 != "") msg.propPath7 = env.get("Payload7TOV");

    if(propPath8 != "") msg.propPath8 = env.get("Payload8TOV");

    node.send([msg, {payload:statusMsg}]);
    node.done();

    return;

}, TimeOut));

statusMsg = ({ fill:"blue", shape:"ring", text:"Timeout" + TimeOut / 1000});
node.status(statusMsg);

/* Can be replaced with a simple msg. return
node.send([msg, {payload:statusMsg}]);
node.done();//
return
*/

return [msg, {payload:statusMsg}];

Thanks Colin...

Yeah... Also had similar thoughts... but hey... Unless you ask and confirm...

Otherwise, think that timeout loop thingy in there is ok?

Tx
E

Thanx for the input, a brain here pointed me in a few directions and I just tried to wrap my grey around it... Stuck with what I could grasp at the time!

I am gonna look at your way and try and learn a bit more!!

Tnx!

What are you doing that involves TCP sockets? Http nodes? TCP? Anything like that involving comms with another system?

What are you doing when your timeout function times out?

TCP sockets is purely via mqtt in out... and a bit of dashboard serving.... and a bit of Telegram notification and control...

The timeout is used in the data stream from a wifi switch on mqtt mostly... If a wifi switch doesn't report in within a pre determined time, for instance, the timeout activates and sends an alert... I also use it if there are "raw" powered sonoffs that I monitor ... If the power goes off, the switch doesn't report in and the voltage/current/power payloads are set to 0.... I have a few switches in the "outlying" non power backed up areas that need constant monitoring... ie a wet basement where the switch turns on a sump pump... or in another case, one of my buildings is about 300m from my household proper... If miscreants dig up and cut the power cables, it goes off line and an alert is raised through non-response....

I have found it handy in quite a few diverse ways...

Regds
Ed

PS: I have written another little sub-flow that is a delay on/off timer that I use a LOT for staggering appliance turn on/off etc... Its probably way simple and done in a complicated manner, but I doubt that this one would give a Skt count up either(not on its own, anyway)... FYI... Here it is:

[
    {
        "id": "4ea703af.383ab4",
        "type": "subflow",
        "name": "ON/OFF Delay",
        "info": "\n\nVariable timer... Independent\nsettings of On or Off delay...\nput in values in SECONDS...\n\nWhile under delay to on or off,\na off or on will cancel the timer\nrespectively....\n\nif off and 0 rec'd, it sends 0\nif on and 1 rec'd, it sends 1",
        "category": "Utility",
        "in": [
            {
                "x": 70,
                "y": 100,
                "wires": [
                    {
                        "id": "ed5090e2.56163"
                    }
                ]
            }
        ],
        "out": [
            {
                "x": 330,
                "y": 100,
                "wires": [
                    {
                        "id": "ed5090e2.56163",
                        "port": 1
                    }
                ]
            },
            {
                "x": 330,
                "y": 150,
                "wires": [
                    {
                        "id": "ed5090e2.56163",
                        "port": 2
                    }
                ]
            }
        ],
        "env": [
            {
                "name": "OnDelay",
                "type": "num",
                "value": "5"
            },
            {
                "name": "OffDelay",
                "type": "num",
                "value": "5"
            }
        ],
        "color": "#777777",
        "inputLabels": [
            "0/1"
        ],
        "outputLabels": [
            "Out",
            "Inv"
        ],
        "icon": "node-red/timer.svg",
        "status": {
            "x": 330,
            "y": 50,
            "wires": [
                {
                    "id": "ed5090e2.56163",
                    "port": 0
                }
            ]
        }
    },
    {
        "id": "ed5090e2.56163",
        "type": "function",
        "z": "4ea703af.383ab4",
        "name": "Delay On/Off",
        "func": "var OnDelay = env.get(\"OnDelay\");  //env.get for SubFlow,flow.get for Function\nOffDelay = env.get(\"OffDelay\");    //env.get for SubFlow,flow.get for Function\nSigStat = context.get(\"SigStat\");\nif (isNaN(SigStat)){SigStat=0}      //Check if it exists, if not, make it default off...\nSignal = msg.payload;\n\nTOF = context.get(\"TOF\")            //Current Timer Object if any\n\nif(OnDelay<0){OnDelay=0}            //chk neg, Zerise if neg\nif(OffDelay<0){OffDelay=0}          //chk neg, Zerise if neg\n\nOnDelay = OnDelay * 1000;           //to mS (Timeout function is ms based)\nOffDelay = OffDelay * 1000;         //to mS\n\n\nif(Signal > 0){Signal = 1}          //Anything > 0 is on  signal\n    else {Signal = 0}               //Anything else is off signal\n\n//******************Cancellation Routine******************\nif(Signal == 1){ \n    if (SigStat == 1){ //its already on...Cancel any possible off Progress, just send and leave\n        clearTimeout(TOF)//Scrap the current timer progress\n        //statusMsg = \"OffCancel\"\n        statusMsg = ({ fill:\"green\", shape:\"dot\", text:\"OffCancel\"+\" OnD:\"+OnDelay/1000+\" OffD:\"+OffDelay/1000});\n        node.status(statusMsg)\n        node.send([{payload:statusMsg}, {payload:1}, {payload:0}]);\n    return;\n    }\n}\nif(Signal=== 0){ \n    if (SigStat === 0){ //its already off...Cancel any possible \"\"on\" Progress, just send and leave\n        clearTimeout(TOF)//Scrap the current timer progress\n        statusMsg = ({ fill:\"red\", shape:\"dot\", text:\"OnCancel\"+\" OnD:\"+OnDelay/1000+\" OffD:\"+OffDelay/1000});\n        node.status(statusMsg)\n        node.send([{payload:statusMsg}, {payload:0}, {payload:1}]);\n        return;\n    }\n}\n//******************Cancellation Routine End**************\n//********************Main Routine************************\nif(Signal == 1){ \n    if (SigStat === 0){ //its off...Turn it on with delay\n        context.set(\"TOF\", setTimeout(function(){\n            statusMsg = ({ fill:\"green\", shape:\"dot\", text:\"On  \"+\" OnD:\"+OnDelay/1000+\" OffD:\"+OffDelay/1000});\n            node.status(statusMsg);\n            node.send([{payload:statusMsg}, {payload:1}, {payload:0}]);\n            context.set(\"SigStat\", 1);\n            }, OnDelay));\n        statusMsg = ({ fill:\"blue\", shape:\"ring\", text:\"0->1\"+\" OnD:\"+OnDelay/1000+\" OffD:\"+OffDelay/1000});\n        node.status(statusMsg)\n        node.send([{payload:statusMsg}]);\n        node.done();//forgot\n        return;//\n    }\n}   \nif(Signal ===0){\n    if (SigStat == 1){ //its on...Turn it off with delay\n        context.set(\"TOF\", setTimeout(function(){\n            statusMsg = ({ fill:\"red\", shape:\"dot\", text:\"Off \"+\" OnD:\"+OnDelay/1000+\" OffD:\"+OffDelay/1000});\n            node.status(statusMsg);\n            node.send([{payload:statusMsg}, {payload:0}, {payload:1}]);\n            context.set(\"SigStat\", 0);\n        }, OffDelay));\n        statusMsg = ({ fill:\"blue\", shape:\"ring\", text:\"1->0\"+\" OnD:\"+OnDelay/1000+\" OffD:\"+OffDelay/1000});\n        node.status(statusMsg)\n        node.send([{payload:statusMsg}]);\n        node.done();//forgot?\n        return;//\n    }\n} \n    \n",
        "outputs": 3,
        "noerr": 0,
        "initialize": "",
        "finalize": "",
        "x": 200,
        "y": 100,
        "wires": [
            [],
            [],
            []
        ],
        "inputLabels": [
            "0/1"
        ],
        "outputLabels": [
            "Status",
            "Out",
            "inv"
        ],
        "icon": "node-red-contrib-chronos/chronos_scheduler.svg"
    }
]

Woohoo!! I got it Right!! (Posting the code, anyway!)

Lol ... Ed

Hmmmmm.... Just did a small message count on mqtt.... About 900 msgs per minute avg or so....

Interesting.... No loops though....

E

Have a look at the mqtt broker log and see if anything is is going on there

Hey Colin

Right... Went for a scuffle thru the mqtt log... nothing spectacular....

Also went and checked all mqtt sources and destinations with debug nodes... No looping/weird extra activity beyond the normal report in/do something requirements....

I can say with about 99.8% certainty that there are no rapid firing recursive loops in the system... (the .2% is well... If I aint seen it on an individual look in, I'm not going to see it at all)...

I now need to scratch some more into the configuration side of things, and the grey matter memory banks, to see what I can remember about when the problem first cropped up....

Retracing:
All I seem to remember, is that swopping from pi3 to pi4-1gb is where the problems started.... (pi3 - 100+days average uptime(then getting beaten to death to service/clean fans etc), pi4 1gb - lucky to get a week to a month consecutive uptime...)
Pi4 was simply booted on the Pi3 SDcard or image thereof, I put the problems to minor O/S incompatibilities...
Pi4-1gb has subsequently been replaced with Pi4-4gb unit, clean load, all software versions been brought up to date, reliability still questionable...

Pi4 has been run via wifi then on hardwire as a possible solution to the problem, no change... But network access speed is improved and has been kept on hardwire in the interim until I can find the problem...

I am now more than a little perplexed, to say the least!!

Any further suggestions perhaps on where I should check... I am just about out of ideas!!

Regds
ed

Try
lsof | grep -i "TCP\|UDP"
I am close to my knowledge limit but I think that shows all the sockets in use, the first column is the process using each one. Since it appears you have hundreds of sockets open that may tell you something.

Edit To put it into a file use
lsof | grep -i "TCP\|UDP" > sockets.txt

You have gone nuclear on the timeout handling. All you need to do is to save a reference to the timeout and, at the start, see if it is still active, if not, cancel it. You don't need a separate variable for that.

Of course, you probably could have used the RBE node to do that for you :slight_smile:

Nothing in there that would impact the number of open sockets however.

As Colin says, if you are sharing a function, you only need to put the function code between triple backticks (start and end on their own lines). Makes it much easier to follow.

Way different number of things viewed thru lsof when I view as root or user.... Still trying to sort through for a definitive ....

Let me check deeper...

Tnx!

Nuclear... Maybe... Unclear definitely! .... LOL....

Thanx for looking into that code though!! much appreciated!!

Saving a reference to the timeout... Yep... Using the message's time stamp I presume you mean?

An alternate way indeed....

Regds
Ed

I think you already had a problem. My Pi2 and Pi3 run for years without intervention. Both quite heavily burdened (in the past, not so much now as most things have been moved to my repurposed laptop which is now my main server).

That was likely an issue I would think? I'm not even sure they are fully compatible architectures - one of the few downsides of using ARM-based chipsets. I wouldn't ever reuse a boot drive between major Pi revisions. Anyway, you've fixed that now.

Start by systematically replacing physical outputs with stubs (a debug node maybe). Start with telegram, then Dashboard and then MQTT. Or whatever order you want. See which of those takes away or at least reduces the problem.

On my main system that returns 426 lines of output - incidentally, I ran it as sudo. Without sudo, I got 59 entries but they are mostly the ones you are interested in anyway. On my Pi2, there were 77 node-red entries. Weirdly, lsof isn't installed on my pi3 for some odd reason.

My Pi3 has the unify controller on it and so returned 2017 entries! Only 33 relate to node/node-red though.

Did you see the cat /proc/net/sockstat result earlier?

TCP: inuse 80 orphan 0 tw 3062 alloc 8262 mem 8225

Not entirely in agreement with that.... It was only swopped out due to speed reasons on graphical reporting (And also because at the time, some sales lad convinced me that Pi3 was at EOL and a replacement would be impossible to get(I was ordering a backup at the time)... but that's water under the bridge now...

Agreed.... Just had to at the time...

Re the lsof as user, this is what I get(Skinny at best):
pi@solpiplog:~ $ lsof | grep -i "TCP|UDP"
python3 634 pi 3u IPv4 19327 0t0 TCP localhost:57902->localhost:6379 (ESTABLISHED)
vncserver 1196 pi 8u IPv4 23954 0t0 TCP localhost:39789->localhost:58038 (ESTABLISHED)
solpiplog 14504 pi 9u IPv4 27209270 0t0 TCP 192.168.0.118:50560->192.168.0.118:1883 (ESTABLISHED)
solpiplog 14504 pi 10u IPv4 27557790 0t0 TCP localhost:60042->localhost:http (ESTABLISHED)
solpiplog 14504 14535 solpiplog pi 9u IPv4 27209270 0t0 TCP 192.168.0.118:50560->192.168.0.118:1883 (ESTABLISHED)
solpiplog 14504 14535 solpiplog pi 10u IPv4 27557790 0t0 TCP localhost:60042->localhost:http (ESTABLISHED)

As sudo, way,way more...

Regds
Ed

Yep, in fact am using this:

cat /proc/net/sockstat | grep sockets | awk '{print ($3)*1}'

in an exec node to log it and display on a dashboard right next to the "Node Red Restart" button....

E

Edit:

I presume, if I'm reading it corrightly, there are 80 sockets in use at that time, with none being unserviced, but 8262 are allocated using 8225 memory.... (Now what I don't understand, is why 8262 with no orphans....)

Please wrap in triple backticks otherwise it is nigh-on unreadable. Thanks.

None of those appear to be related to Node-RED, where are all of the ones owned by "node"? At least there should be a bunch saying *:1880 (LISTEN) - oh, hang on, you've an error in that command, it should be lsof | grep -i "TCP\|UDP".

I think that means that one or more processes have allocated sockets but they are held open - possibly for listening - but not currently in use.

You do need to start taking out nodes with TCP or UDP connections one at a time until you track down the issue.