Help with how to manage messages - timing

Scenario:

Every n seconds I scan visible WAPs. I then create a list and look for mine.
If it isn't seen, I want to be informed.

I have some rough - and problematic - code/nodes but I am seeing a problem.

Repeatedly I am seeing BOTH WAPs showing as Off-line at the same time.
(To the second)

Which probably isn't true.
So I fear there is inherent problem/s with what I have. So I'm kind of going back to the start and trying to do it again, but this time: better.

I do want false positives eliminated to a degree also. (Or is that false negatives?)

So anyway, I'm a bit overcome with how to do this.

I don't want you to make the flow to do that, but one to help me get my head around the workings of such a beast.

What I understand to now:

Every n seconds I will receive a list of WAPs.
I split it then switch for the ones I want.
Each of mine then start a count down (n seconds+) to be reset.
(loop)

If a new message comes in the count down is reset.

If a new message doesn't come in (the count down gets to 0) a message is generated.

My thinking on what to do here
But this message has to be in sync with the main messages so it is understood that the names weren't found in the lists - twice over - and so it is valid.

If that is true, that is strike 1.
If there is a second list received that doesn't have the WAP's name then the message is generated and sent.

(After writing this I now have a kind of better understanding of what to do. But if you have been in this scenario and have some basic flow structures: would you mind sharing?

Thanks in advance.

How are you "scanning"? What is the goal in doing this? Do you have unreliable wifi?

No, what it is is me being.... me. :wink:

To scan I am running a python script that just spits out the names of all the visible WAPs in my area.

I have WiFi devices and if they stop talking to me I want to know if it is them or if the WiFi is down.

So every n seconds I do a WiFi scan and check that my WAPs are visible.
If they aren't, alarm go off.

Recently (it is becoming apparent now at least) that every now and then I was getting my WAP flagged as Offline.
Annoying at best.
But recently I have now set up a second WAP - not in use just yet, but.....
And I am also getting Offline messages for it too.

At first I was dismissive of it, but then it is happening more and more.

Then I noticed that they BOTH go down at the same time - to the second.
(Well, at least that is on the time stamp I get when they are indicated to me as being down)

That got me wondering if there is another thing happening in the background I am not seeing.

So what I want to do is be smarter at how I scan the WAPs, and IF they are down, what happens.

I'm actually making good headway after posting the question as it helped me see the problem better.

I've bashed together this:

[{"id":"643c3cdc07e3c049","type":"function","z":"c56bddee.ca0a18","name":"","func":"var counter = context.get(\"counter\") || 0;\nif (msg.payload == \"UP\")\n{\n    //  UP\n    //node.warn(\"+1\");\n    node.status({text: \"+1 \"});\n    counter = counter + 1;\n    context.set(\"counter\",counter);\n    if (counter < 3)\n    {\n        //  Ok\n        msg.payload = \"Online\";\n        return msg;\n    }\n} else\nif (msg.payload == \"DOWN\")\n{\n    //  DOWN\n    //node.warn(\"-1\");\n    node.status({text: \"-1 \"});\n    counter = 0;\n    context.set(\"counter\",counter);\n}\nif (counter < 0)\n{\n    counter = 0;\n    context.set(\"counter\",counter);\n}\n\n//node.warn(\"Counter is at value \" + counter);\nnode.status({text: \"Counter is at value \" + counter});\n\n\nif (counter > 1)\n{\n    //  Two fails.\n    msg.payload = \"Offline\"\n    return msg;\n}\n\n//return;\n\n//msg.payload = \"Online\";\n\n//return msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":650,"y":4620,"wires":[["d64c32ccf6489133"]]}]

Each time I scan, I send an UP message into the node.
Meanwhile after the filtering and slight delay, if the WAP is found in the list: I send a "DOWN" message.

If two (or is it three?) messages are received by the node with no "DOWN" message, it sends an "Offline" message.

That should go a long way to finding out if there really are WAP dropouts.

You still haven't said how you do the scan. This is important. A network scan is not a reliable indication as to whether your access point is alive, only 1 step along the way. But depending on how you've scanned, it is also possible that you are doing a ping. A ping of a device tells you if the network card is alive but not whether the services on the device are alive. It is step 2. For a more in-depth view, you may want to see if your AP's have SNMP available. If they do, then you can get a much more accurate view of whether the AP is not only alive but whether it is actually handling traffic.

I run a python script.

#wifiscan.py
import subprocess
import json
import re
 
child = subprocess.Popen('sudo iwlist wlan0 scan | grep ESSID',shell=True,stdout=subprocess.PIPE)
output = child.communicate()[0]
output = output.decode("utf8")
output = output.replace('                   ESSID:','')
output = output.replace(' "','')
output = output.replace('"','')
output = output.splitlines()
 
output = json.dumps(output)
 
print (output)

I hope it shows me all the WAPs visible.

It is just now interesting that I have 2, that I am seeing them both fail together.

So I am suspicious it isn't the WAP(s) are going down together, but something else.
There is more nodes after that - of course - that split then switch (depending on name) and things monitor the frequency of the messages arriving.

That is problematic and I've restructured it to seeing how many sequential messages are missed.

That is a whole other thing which is a whole other project I am doing.

Originally I would ping all my network devices. That is/was painful.

I am now doing nmap -Sp (ip address) /24 and parsing the reply.
Thta is a lot less load, but has its own set of problems on which I am working.
But I feel/fear that is off topic to me seeing messages from the afore mentioned script not showing me my WAP names as visible.

Right, so first off, you could far more easily run that from Node-RED via the exec node, you don't need python there :wink:

Secondly you are asking your server's wifi card to scan the wireless network. I've no idea how reliable that is. Aren't your APs on your wired network? If so, it would be a lot more reliable to get some SNMP data from them.

You can't run nmap from node red and get all the details, security issue, ie. needs sudo if you want hostnames for example.

Yes, a different thread perhaps. But for reference, I run this BASH script on a CRON schedule

#! /usr/bin/env bash
# Fast scan the local network for live devices and record
# to /tmp/nmap.xml which can be used in Node-RED
#
# To run manually:
#   sudo /home/home/nrmain/system/nmap_scan.sh
#
# To run via cron:
#   sudo crontab -e
#       01,16,31,46 * * * * /home/home/nrmain/system/nmap_scan.sh

# Run the scan
nmap -sn --oX /tmp/nmap.xml --privileged -R --system-dns --webxml 192.168.1.0/24
# Make sure ownership & ACLs on the output are secure
chown root:home /tmp/nmap.xml
chmod --silent 640 /tmp/nmap.xml
# Trigger the Node-RED update
#curl  --silent --output /dev/null 'http://localhost:1880/localnetscan' > /dev/null
#curl --insecure -I 'https://localhost:1880/localnetscan'
curl -I 'http://localhost:1880/localnetscan'

It grabs the latest network data and outputs to an XML file and then calls a Node-RED http-in endpoint which triggers a flow to import the XML and merge it to my network device records.

I didn't know how to do it at the time.
How can NR (node?) list visible WAPs?

One AP is my router. The second is a RasPi.
The router's WAP is for heavy devices / computers / etc that need internet access.
The RasPi one is (still not used) is for low frequency stuff. WiFi devices sending telemetry data.

I don't quite get the last part of that about the AP on the wired network.
Yes, they are connected, but I feel I've missed the question.

Let's say you scan every 20 seconds.

You want to be alerted if your access point is not visible on two consecutive scans.

Doesn't a trigger node do exactly this?

  • send nothing then after 41 seconds send "AP down", restart if another message arrives -
1 Like

Well you can if you let the user running Node-RED run that command without a password prompt. But yes, see my post above to show how I do it without needing to mess with sudo.

Yes, I am now doing that but for other reasons. (That is to replace the PING thing I was doing)
But that is whole different thing.

THIS is that I am seeing regular Loss of WAP signals.
They were worse. Some of that was reduced when I replaced power supplies with better ones.

But the (one) WAP was still now and then dropping out / vanishing.
Not end of the world but it was annoying.
As it was reduced, I kind of put up with it. But now I have (am testing) the RasPi as a WAP also and they (both) are showing (note: SHOWING) loss of WAP visibility at the same time it is now indicative that how I am detecting the WAP's visibility is not the best.

Yes, ok. But as you said, that may be better on a different thread. (That is for the "seeing who is connected to my network" - yes?)

Mostly, but understanding what nmap does will show you that it is also pinging the addresses - which is one of the reasons I asked. The parameters I've used show you the latency on the devices.

<host><status state="up" reason="arp-response" reason_ttl="0"/>
<address addr="192.168.1.193" addrtype="ipv4"/>
<address addr="84:0D:8E:3D:3B:D4" addrtype="mac" vendor="Espressif"/>
<hostnames>
</hostnames>
<times srtt="97039" rttvar="97039" to="485195"/>
</host>

Yes and that is how I was (round about) doing it.

But that I was seeing BOTH WAPs going Offline at the same time I fear that is problematic.

I've since done it very different way.
This is the guts of how I am doing it.

Code from a function node and is part of a subflow now to help with screen real estate economy.

var counter = context.get("counter") || 0;

//  Get values from $env variables.
let UP = env.get("up");
let DOWN = env.get("down");
let offline = env.get("offlinemsg");

if (msg.payload == UP)
{
    //  UP
    //node.warn("+1");
    node.status({text: "+1 "});
    counter = counter + 1;
    context.set("counter",counter);
    if (counter < 3)
    {
        //  Ok
        //  Therefore do/send NOTHING
        return;
//        msg.payload = "Online";
//        return msg;
    }
} else
if (msg.payload == DOWN)
{
    //  DOWN
    //node.warn("-1");
    node.status({text: "-1 "});
    counter = 0;
    context.set("counter",counter);
}
if (counter < 0)
{
    counter = 0;
    context.set("counter",counter);
}

//node.warn("Counter is at value " + counter);
node.status({text: "Counter is at value " + counter});


if (counter > 1)
{
    //  Two fails.
//    msg.payload = "Offline";
    msg.payload = offline;
    return msg;
}

There are other nodes around it too to do a bit of stuff. But that is about the guts of the new way.
(Only done today so it is still in very early day testing.)

Yes. And that is what I use - but that isn't this problem.
Doing that (staying with this for a moment) opened a big can of worms which I had to work on for a few hours to make it "drop in compatible" with what I have.
It's a bit of a rock and hard place where to draw the line on what to modify to keep it working in the bigger picture.

But sorry, we are getting off topic for this topic/thread.

If anyone is still interested. :wink:

(Foreign nodes used)
(fan, gate)

[{"id":"fbe74a055f167918","type":"subflow","name":"WAP status monitor","info":"","category":"","in":[{"x":100,"y":80,"wires":[{"id":"034a71c3fa82b627"},{"id":"643c3cdc07e3c049"}]}],"out":[{"x":660,"y":80,"wires":[{"id":"643c3cdc07e3c049","port":0}]},{"x":660,"y":150,"wires":[{"id":"9f8f0275c09b9051","port":0}]},{"x":660,"y":210,"wires":[{"id":"ec436d4dd0c415df","port":0}]}],"env":[{"name":"down","type":"str","value":""},{"name":"up","type":"str","value":""},{"name":"offlinemsg","type":"str","value":""}],"meta":{},"color":"#DDAA99","outputLabels":["Offline message","Feedback signal","Problem message"],"status":{"x":660,"y":280,"wires":[{"id":"344fab4eca9db3f5","port":0}]}},{"id":"643c3cdc07e3c049","type":"function","z":"fbe74a055f167918","name":"Counter","func":"var counter = context.get(\"counter\") || 0;\n\n//  Get values from $env variables.\nlet UP = env.get(\"up\");\nlet DOWN = env.get(\"down\");\nlet offline = env.get(\"offlinemsg\");\n\nif (msg.payload == UP)\n{\n    //  UP\n    //node.warn(\"+1\");\n    node.status({text: \"+1 \"});\n    counter = counter + 1;\n    context.set(\"counter\",counter);\n    if (counter < 3)\n    {\n        //  Ok\n        //  Therefore do/send NOTHING\n        return;\n//        msg.payload = \"Online\";\n//        return msg;\n    }\n} else\nif (msg.payload == DOWN)\n{\n    //  DOWN\n    //node.warn(\"-1\");\n    node.status({text: \"-1 \"});\n    counter = 0;\n    context.set(\"counter\",counter);\n}\nif (counter < 0)\n{\n    counter = 0;\n    context.set(\"counter\",counter);\n}\n\n//node.warn(\"Counter is at value \" + counter);\nnode.status({text: \"Counter is at value \" + counter});\n\n\nif (counter > 1)\n{\n    //  Two fails.\n//    msg.payload = \"Offline\";\n    msg.payload = offline;\n    return msg;\n}\n","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":440,"y":80,"wires":[[]]},{"id":"034a71c3fa82b627","type":"switch","z":"fbe74a055f167918","name":"Up/Down","property":"payload","propertyType":"msg","rules":[{"t":"eq","v":"UP","vt":"env"},{"t":"eq","v":"DOWN","vt":"env"}],"checkall":"true","repair":false,"outputs":2,"x":105,"y":170,"wires":[["db3a6b270e9875b4"],["635488afad2607dd"]],"l":false},{"id":"9f8f0275c09b9051","type":"trigger","z":"fbe74a055f167918","name":"Variable","op1":"","op2":"TICK","op1type":"nul","op2type":"str","duration":"30","extend":false,"overrideDelay":true,"units":"s","reset":"","bytopic":"all","topic":"topic","outputs":1,"x":440,"y":150,"wires":[[]]},{"id":"635488afad2607dd","type":"change","z":"fbe74a055f167918","name":"Reset","rules":[{"t":"set","p":"reset","pt":"msg","to":"true","tot":"bool"},{"t":"delete","p":"payload","pt":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":315,"y":170,"wires":[["9f8f0275c09b9051","db3a6b270e9875b4"]],"l":false},{"id":"115ce6b7b0ecbab9","type":"change","z":"fbe74a055f167918","name":"","rules":[{"t":"set","p":"delay","pt":"msg","to":"msg.delay / 6","tot":"jsonata"}],"action":"","property":"","from":"","to":"","reg":false,"x":315,"y":130,"wires":[["9f8f0275c09b9051"]],"l":false},{"id":"db3a6b270e9875b4","type":"counter","z":"fbe74a055f167918","name":"","init":"0","step":"1","lower":"","upper":"","mode":"increment","outputs":"1","x":185,"y":130,"wires":[["fa1f32eae6eb928a"]],"l":false},{"id":"fa1f32eae6eb928a","type":"switch","z":"fbe74a055f167918","name":"count value?","property":"count","propertyType":"msg","rules":[{"t":"lt","v":"3","vt":"num"},{"t":"else"}],"checkall":"true","repair":false,"outputs":2,"x":245,"y":130,"wires":[["115ce6b7b0ecbab9"],["ec436d4dd0c415df"]],"l":false},{"id":"ec436d4dd0c415df","type":"change","z":"fbe74a055f167918","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"Problem","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":460,"y":210,"wires":[[]]},{"id":"344fab4eca9db3f5","type":"status","z":"fbe74a055f167918","name":"","scope":["643c3cdc07e3c049"],"x":460,"y":280,"wires":[[]]},{"id":"610e3e684b9f0ebb","type":"pythonshell in","z":"c56bddee.ca0a18","name":"wifiscan","pyfile":"/home/pi/python_stuff/wifiscan.py","virtualenv":"","continuous":false,"stdInData":false,"x":340,"y":4620,"wires":[["64e80fd6757450b7"]]},{"id":"93bb9ee10ed12d10","type":"split","z":"c56bddee.ca0a18","name":"Spliter","splt":",","spltType":"str","arraySplt":1,"arraySpltType":"len","stream":false,"addname":"","x":330,"y":4730,"wires":[["ff4761cea65711f5"]]},{"id":"ff4761cea65711f5","type":"switch","z":"c56bddee.ca0a18","name":"Look for my WAP names","property":"payload","propertyType":"msg","rules":[{"t":"cont","v":"Marys_Farm_2.4","vt":"str"},{"t":"cont","v":"PiNet","vt":"str"},{"t":"cont","v":"MusicPi","vt":"str"}],"checkall":"true","repair":false,"outputs":3,"x":390,"y":4780,"wires":[["aa10b4119644f4bf"],[],[]]},{"id":"d64c32ccf6489133","type":"debug","z":"c56bddee.ca0a18","name":"STATUS","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":1050,"y":4590,"wires":[]},{"id":"c60b0751c9bb4b15","type":"change","z":"c56bddee.ca0a18","name":"up","rules":[{"t":"set","p":"payload","pt":"msg","to":"UP","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":390,"y":4670,"wires":[["d3d2d9bfb238c398"]]},{"id":"725a561b78318d32","type":"change","z":"c56bddee.ca0a18","name":"down","rules":[{"t":"set","p":"payload","pt":"msg","to":"true","tot":"bool"}],"action":"","property":"","from":"","to":"","reg":false,"x":870,"y":4810,"wires":[["d3d2d9bfb238c398"]]},{"id":"6dfd1861b9442a00","type":"gate","z":"c56bddee.ca0a18","name":"Simulate WAP not visible","controlTopic":"control","defaultState":"open","openCmd":"open","closeCmd":"close","toggleCmd":"toggle","defaultCmd":"default","statusCmd":"status","persist":false,"storeName":"memory","x":670,"y":4810,"wires":[["725a561b78318d32","04a0212668fa4649"]]},{"id":"7d5746c5627d733b","type":"inject","z":"c56bddee.ca0a18","name":"Stop","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"control","payload":"close","payloadType":"str","x":450,"y":4840,"wires":[["6dfd1861b9442a00"]]},{"id":"43149a1220842b81","type":"inject","z":"c56bddee.ca0a18","name":"Go","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"control","payload":"open","payloadType":"str","x":450,"y":4880,"wires":[["6dfd1861b9442a00"]]},{"id":"bee2cea7a2f5a514","type":"inject","z":"c56bddee.ca0a18","name":"(reset - for testing only)","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payloadType":"date","x":690,"y":4890,"wires":[["725a561b78318d32"]]},{"id":"d3d2d9bfb238c398","type":"fan","z":"c56bddee.ca0a18","name":"","x":675,"y":4670,"wires":[["ed619f1dc568ee75"]],"l":false},{"id":"80c73d34b4f76dce","type":"debug","z":"c56bddee.ca0a18","name":"Feedback","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":1050,"y":4690,"wires":[]},{"id":"64e80fd6757450b7","type":"change","z":"c56bddee.ca0a18","name":"delay (ms)","rules":[{"t":"set","p":"delay","pt":"msg","to":"$globalContext(\"SCAN_TIME\") * 6","tot":"jsonata"}],"action":"","property":"","from":"","to":"","reg":false,"x":295,"y":4670,"wires":[["93bb9ee10ed12d10","c60b0751c9bb4b15"]],"l":false},{"id":"f21047b4495ebcc6","type":"debug","z":"c56bddee.ca0a18","name":"Problem (Just for the sake of making sure there is a signal)","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":1210,"y":4770,"wires":[]},{"id":"04a0212668fa4649","type":"change","z":"c56bddee.ca0a18","name":"Online","rules":[{"t":"set","p":"payload","pt":"msg","to":"Online","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":880,"y":4850,"wires":[["d64c32ccf6489133"]]},{"id":"aa10b4119644f4bf","type":"trigger","z":"c56bddee.ca0a18","name":"Slight delay (needed)","op1":"","op2":"","op1type":"nul","op2type":"payl","duration":"5","extend":false,"overrideDelay":false,"units":"s","reset":"","bytopic":"all","topic":"topic","outputs":1,"x":650,"y":4750,"wires":[["6dfd1861b9442a00"]]},{"id":"ed619f1dc568ee75","type":"subflow:fbe74a055f167918","z":"c56bddee.ca0a18","name":"","env":[{"name":"down","value":"true","type":"bool"},{"name":"up","value":"UP","type":"str"},{"name":"offlinemsg","value":"Offline","type":"str"}],"x":810,"y":4670,"wires":[["d64c32ccf6489133"],["80c73d34b4f76dce","610e3e684b9f0ebb"],["f21047b4495ebcc6"]]},{"id":"9e6c34630719c036","type":"comment","z":"c56bddee.ca0a18","name":"Timed pulses here in reality","info":"","x":360,"y":4560,"wires":[]}]

And there is a sub-flow.

Left side of flow:
My clock (every n seconds) comes into the node to run a python script (posted above) and list the visible WAPs.

Moving down:
The delay (frequency) is put in the message to share with other nodes where needed.
Split the message to get individual WAP names.
Switch to my 2 WAPS (there's going to be a third) :wink:

I add a slight delay - you'll see why soon.
The gate node simulates / allows you to simulate the WAP name not being visible.
(I've tweaked the flow IRL slightly) to get around the need for the 2 change nodes.
One sends the online message.
The one above it sends a message to the subflow to tell it that the WAP is visible.

Looping now back to the left of the flow:
Where we went down before, we go right.
Go into a change node that sends an UP message.
That goes to the function node.
The message is received. An internal counter increments.
(Now going back to the online message bit.
The down message is sent into the function node and that decrements the counter.

All things being good, if the counter is < 3 nothing is sent.
If the counter >=3 the offline message is sent.

So if 3 messages pass (no time limits used/needed) and there is no DOWN message received an offline message is sent.

Somewhere in there the feedback output of the subflow sends a signal back to force a scan of visible WAPs.

The third output is just there to make sure things are working.

But the real thing is the offline message being sent on output 1.

Clearer?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.