I don't know what they mean but none of my systems have any of those in 4 figures, most are 2 figures.
Perhaps the loop (if that is what it is) involves TCP activity and that is what if giving the error.
I don't know what they mean but none of my systems have any of those in 4 figures, most are 2 figures.
Perhaps the loop (if that is what it is) involves TCP activity and that is what if giving the error.
Could be... I'm busy logging it now, lets see whether it increases over time or starts trending during a particular activity...
Yes indeed. It would be very rare to have that many sockets open. Especially on a Pi.
Hmmmmm..... Been doing some looking into things..... # of Sockets are increasing slowly as time goes by....
Im gonna let it run till morning, lets see how stacked it is by then.... A restart of NR drops it to around the 400 mark, the little pi is quite busy.... Over a period of about 1/2hr it has increased by around 100 to about 500Skts open....
Now, to find out what's causing it!!
Cya
E
Hey guys....
Still chasing my tail trying to find out where my Skt leak is...
I am wondering whether my infantile attempt at writing an async flow has something to do with it... If anyof you great oracles could check this subflow over and tell me if it could possibly be contributing to the problem, I would appreciate it immensely!!
The way this subflow/node thingy works, is as follows:
Any message sent to it, resets a timer (value as set up in the env variables) and is passed through..
Should the timer expire before a message arrives, a topic/value message will be sent instead.(Up to 8 presets can be set via the env variables)
(I am using it on sonoff switches to replace the telemetry message on timeout with a predetermined one for either nullifying a value ie voltage, or sending an alert to tell me that the unit has gone offline)
Please excuse the "less than succinct" coding, I am a bit new to this!!
I hope I don't incur the wrath of the fundi's out here, I have no clue as to how to attach a flow, so I have downloaded it and, well.... Here goes!
Payload Timeout.json (2.9 KB)
TIA
Ed
For the future you can paste flows directly in here. See this post for more details - How to share code or flow json
There doesn't seem to be any TCP work going on in that function. You are looking for something that does something with TCP and maybe doesn't properly clear up. Possibly whatever you are doing with the timeout after it times out, or maybe something else.
I am most certainly NOT an Oracle. I haver never used RED.util.setMessageProperty
to create the msg properties so I have no idea what the benefits are. The only other possible issue is the lack of let
and/or const
when defining variables which may cause problems. But apart from that (and I cannot see why it would cause your problems) as Colin says, the only possible reason this function could cause an issue is if something is affected by the time out delay.
Getting ready to be embarrassed by a lack of understanding.
let TimeOut = env.get("TimeOut"); //env.get for SubFlow,flow.get for Function
const TOF = context.get("TOF"); //Current Timer Object if any
if(TimeOut < 0) TimeOut = 0; //chk neg, Zeroise if neg
TimeOut = TimeOut * 1000; //to mS (Timeout function is ms based)
let propPath1 = env.get("propPath1"); //Get Replacement Payloads from Setup
let propPath2 = env.get("propPath2");
let propPath3 = env.get("propPath3");
let propPath4 = env.get("propPath4");
let propPath5 = env.get("propPath5");
let propPath6 = env.get("propPath6");
let propPath7 = env.get("propPath7");
let propPath8 = env.get("propPath8");
//************Cancellation Routine (any msg recd)******************
clearTimeout(TOF); //Scrap the current timer progress if any
// Will never be seen
//let statusMsg = ({ fill:"green", shape:"dot", text:"TOCancel" + " TimeOut:" + TimeOut / 1000});
//node.status(statusMsg)
//******************Cancellation Routine End**************
//********************Main Routine Set Timeout************
context.set("TOF", setTimeout(function(){
statusMsg = ({ fill:"red", shape:"dot", text:"TimeOut" + TimeOut / 1000});
node.status(statusMsg);
/* As far as I can tell RED.util.setMessageProperty() can be replaced with msg.property
if(propPath1 != ""){ //Change it
RED.util.setMessageProperty(msg, propPath1, env.get("Payload1TOV"), true);
}
*/
if(propPath1 != "") msg.propPath1 = env.get("Payload1TOV"); //Change it
if(propPath2 != "") msg.propPath2 = env.get("Payload2TOV");
if(propPath3 != "") msg.propPath3 = env.get("Payload3TOV");
if(propPath4 != "") msg.propPath4 = env.get("Payload4TOV");
if(propPath5 != "") msg.propPath5 = env.get("Payload5TOV");
if(propPath6 != "") msg.propPath6 = env.get("Payload6TOV");
if(propPath7 != "") msg.propPath7 = env.get("Payload7TOV");
if(propPath8 != "") msg.propPath8 = env.get("Payload8TOV");
node.send([msg, {payload:statusMsg}]);
node.done();
return;
}, TimeOut));
statusMsg = ({ fill:"blue", shape:"ring", text:"Timeout" + TimeOut / 1000});
node.status(statusMsg);
/* Can be replaced with a simple msg. return
node.send([msg, {payload:statusMsg}]);
node.done();//
return
*/
return [msg, {payload:statusMsg}];
Thanks Colin...
Yeah... Also had similar thoughts... but hey... Unless you ask and confirm...
Otherwise, think that timeout loop thingy in there is ok?
Tx
E
Thanx for the input, a brain here pointed me in a few directions and I just tried to wrap my grey around it... Stuck with what I could grasp at the time!
I am gonna look at your way and try and learn a bit more!!
Tnx!
What are you doing that involves TCP sockets? Http nodes? TCP? Anything like that involving comms with another system?
What are you doing when your timeout function times out?
TCP sockets is purely via mqtt in out... and a bit of dashboard serving.... and a bit of Telegram notification and control...
The timeout is used in the data stream from a wifi switch on mqtt mostly... If a wifi switch doesn't report in within a pre determined time, for instance, the timeout activates and sends an alert... I also use it if there are "raw" powered sonoffs that I monitor ... If the power goes off, the switch doesn't report in and the voltage/current/power payloads are set to 0.... I have a few switches in the "outlying" non power backed up areas that need constant monitoring... ie a wet basement where the switch turns on a sump pump... or in another case, one of my buildings is about 300m from my household proper... If miscreants dig up and cut the power cables, it goes off line and an alert is raised through non-response....
I have found it handy in quite a few diverse ways...
Regds
Ed
PS: I have written another little sub-flow that is a delay on/off timer that I use a LOT for staggering appliance turn on/off etc... Its probably way simple and done in a complicated manner, but I doubt that this one would give a Skt count up either(not on its own, anyway)... FYI... Here it is:
[
{
"id": "4ea703af.383ab4",
"type": "subflow",
"name": "ON/OFF Delay",
"info": "\n\nVariable timer... Independent\nsettings of On or Off delay...\nput in values in SECONDS...\n\nWhile under delay to on or off,\na off or on will cancel the timer\nrespectively....\n\nif off and 0 rec'd, it sends 0\nif on and 1 rec'd, it sends 1",
"category": "Utility",
"in": [
{
"x": 70,
"y": 100,
"wires": [
{
"id": "ed5090e2.56163"
}
]
}
],
"out": [
{
"x": 330,
"y": 100,
"wires": [
{
"id": "ed5090e2.56163",
"port": 1
}
]
},
{
"x": 330,
"y": 150,
"wires": [
{
"id": "ed5090e2.56163",
"port": 2
}
]
}
],
"env": [
{
"name": "OnDelay",
"type": "num",
"value": "5"
},
{
"name": "OffDelay",
"type": "num",
"value": "5"
}
],
"color": "#777777",
"inputLabels": [
"0/1"
],
"outputLabels": [
"Out",
"Inv"
],
"icon": "node-red/timer.svg",
"status": {
"x": 330,
"y": 50,
"wires": [
{
"id": "ed5090e2.56163",
"port": 0
}
]
}
},
{
"id": "ed5090e2.56163",
"type": "function",
"z": "4ea703af.383ab4",
"name": "Delay On/Off",
"func": "var OnDelay = env.get(\"OnDelay\"); //env.get for SubFlow,flow.get for Function\nOffDelay = env.get(\"OffDelay\"); //env.get for SubFlow,flow.get for Function\nSigStat = context.get(\"SigStat\");\nif (isNaN(SigStat)){SigStat=0} //Check if it exists, if not, make it default off...\nSignal = msg.payload;\n\nTOF = context.get(\"TOF\") //Current Timer Object if any\n\nif(OnDelay<0){OnDelay=0} //chk neg, Zerise if neg\nif(OffDelay<0){OffDelay=0} //chk neg, Zerise if neg\n\nOnDelay = OnDelay * 1000; //to mS (Timeout function is ms based)\nOffDelay = OffDelay * 1000; //to mS\n\n\nif(Signal > 0){Signal = 1} //Anything > 0 is on signal\n else {Signal = 0} //Anything else is off signal\n\n//******************Cancellation Routine******************\nif(Signal == 1){ \n if (SigStat == 1){ //its already on...Cancel any possible off Progress, just send and leave\n clearTimeout(TOF)//Scrap the current timer progress\n //statusMsg = \"OffCancel\"\n statusMsg = ({ fill:\"green\", shape:\"dot\", text:\"OffCancel\"+\" OnD:\"+OnDelay/1000+\" OffD:\"+OffDelay/1000});\n node.status(statusMsg)\n node.send([{payload:statusMsg}, {payload:1}, {payload:0}]);\n return;\n }\n}\nif(Signal=== 0){ \n if (SigStat === 0){ //its already off...Cancel any possible \"\"on\" Progress, just send and leave\n clearTimeout(TOF)//Scrap the current timer progress\n statusMsg = ({ fill:\"red\", shape:\"dot\", text:\"OnCancel\"+\" OnD:\"+OnDelay/1000+\" OffD:\"+OffDelay/1000});\n node.status(statusMsg)\n node.send([{payload:statusMsg}, {payload:0}, {payload:1}]);\n return;\n }\n}\n//******************Cancellation Routine End**************\n//********************Main Routine************************\nif(Signal == 1){ \n if (SigStat === 0){ //its off...Turn it on with delay\n context.set(\"TOF\", setTimeout(function(){\n statusMsg = ({ fill:\"green\", shape:\"dot\", text:\"On \"+\" OnD:\"+OnDelay/1000+\" OffD:\"+OffDelay/1000});\n node.status(statusMsg);\n node.send([{payload:statusMsg}, {payload:1}, {payload:0}]);\n context.set(\"SigStat\", 1);\n }, OnDelay));\n statusMsg = ({ fill:\"blue\", shape:\"ring\", text:\"0->1\"+\" OnD:\"+OnDelay/1000+\" OffD:\"+OffDelay/1000});\n node.status(statusMsg)\n node.send([{payload:statusMsg}]);\n node.done();//forgot\n return;//\n }\n} \nif(Signal ===0){\n if (SigStat == 1){ //its on...Turn it off with delay\n context.set(\"TOF\", setTimeout(function(){\n statusMsg = ({ fill:\"red\", shape:\"dot\", text:\"Off \"+\" OnD:\"+OnDelay/1000+\" OffD:\"+OffDelay/1000});\n node.status(statusMsg);\n node.send([{payload:statusMsg}, {payload:0}, {payload:1}]);\n context.set(\"SigStat\", 0);\n }, OffDelay));\n statusMsg = ({ fill:\"blue\", shape:\"ring\", text:\"1->0\"+\" OnD:\"+OnDelay/1000+\" OffD:\"+OffDelay/1000});\n node.status(statusMsg)\n node.send([{payload:statusMsg}]);\n node.done();//forgot?\n return;//\n }\n} \n \n",
"outputs": 3,
"noerr": 0,
"initialize": "",
"finalize": "",
"x": 200,
"y": 100,
"wires": [
[],
[],
[]
],
"inputLabels": [
"0/1"
],
"outputLabels": [
"Status",
"Out",
"inv"
],
"icon": "node-red-contrib-chronos/chronos_scheduler.svg"
}
]
Woohoo!! I got it Right!! (Posting the code, anyway!)
Lol ... Ed
Hmmmmm.... Just did a small message count on mqtt.... About 900 msgs per minute avg or so....
Interesting.... No loops though....
E
Have a look at the mqtt broker log and see if anything is is going on there
Hey Colin
Right... Went for a scuffle thru the mqtt log... nothing spectacular....
Also went and checked all mqtt sources and destinations with debug nodes... No looping/weird extra activity beyond the normal report in/do something requirements....
I can say with about 99.8% certainty that there are no rapid firing recursive loops in the system... (the .2% is well... If I aint seen it on an individual look in, I'm not going to see it at all)...
I now need to scratch some more into the configuration side of things, and the grey matter memory banks, to see what I can remember about when the problem first cropped up....
Retracing:
All I seem to remember, is that swopping from pi3 to pi4-1gb is where the problems started.... (pi3 - 100+days average uptime(then getting beaten to death to service/clean fans etc), pi4 1gb - lucky to get a week to a month consecutive uptime...)
Pi4 was simply booted on the Pi3 SDcard or image thereof, I put the problems to minor O/S incompatibilities...
Pi4-1gb has subsequently been replaced with Pi4-4gb unit, clean load, all software versions been brought up to date, reliability still questionable...
Pi4 has been run via wifi then on hardwire as a possible solution to the problem, no change... But network access speed is improved and has been kept on hardwire in the interim until I can find the problem...
I am now more than a little perplexed, to say the least!!
Any further suggestions perhaps on where I should check... I am just about out of ideas!!
Regds
ed
Try
lsof | grep -i "TCP\|UDP"
I am close to my knowledge limit but I think that shows all the sockets in use, the first column is the process using each one. Since it appears you have hundreds of sockets open that may tell you something.
Edit To put it into a file use
lsof | grep -i "TCP\|UDP" > sockets.txt
You have gone nuclear on the timeout handling. All you need to do is to save a reference to the timeout and, at the start, see if it is still active, if not, cancel it. You don't need a separate variable for that.
Of course, you probably could have used the RBE node to do that for you
Nothing in there that would impact the number of open sockets however.
As Colin says, if you are sharing a function, you only need to put the function code between triple backticks (start and end on their own lines). Makes it much easier to follow.
Way different number of things viewed thru lsof when I view as root or user.... Still trying to sort through for a definitive ....
Let me check deeper...
Tnx!
Nuclear... Maybe... Unclear definitely! .... LOL....
Thanx for looking into that code though!! much appreciated!!
Saving a reference to the timeout... Yep... Using the message's time stamp I presume you mean?
An alternate way indeed....
Regds
Ed
I think you already had a problem. My Pi2 and Pi3 run for years without intervention. Both quite heavily burdened (in the past, not so much now as most things have been moved to my repurposed laptop which is now my main server).
That was likely an issue I would think? I'm not even sure they are fully compatible architectures - one of the few downsides of using ARM-based chipsets. I wouldn't ever reuse a boot drive between major Pi revisions. Anyway, you've fixed that now.
Start by systematically replacing physical outputs with stubs (a debug node maybe). Start with telegram, then Dashboard and then MQTT. Or whatever order you want. See which of those takes away or at least reduces the problem.
On my main system that returns 426 lines of output - incidentally, I ran it as sudo. Without sudo, I got 59 entries but they are mostly the ones you are interested in anyway. On my Pi2, there were 77 node-red entries. Weirdly, lsof
isn't installed on my pi3 for some odd reason.
My Pi3 has the unify controller on it and so returned 2017 entries! Only 33 relate to node/node-red though.
Did you see the cat /proc/net/sockstat
result earlier?
TCP: inuse 80 orphan 0 tw 3062 alloc 8262 mem 8225