Ping - network latency

I've been getting some inconsistent latency on ping requests, and it appears that it is being caused by this (copied from this article)

The reason the first ping usually fails is that the remote router in that LAN has to put the ping request on hold to send out an ARP broadcast to learn the MAC address of the remote device, then wait for a response, and then send the first ping through.

My knowledge of networks is pretty poor, so if I've got this wrong, please tell me!

To test this, I made a simple flow that pings cloudflare servers using node-red-node-ping.
The flow pings 1.1.1.1 every 2 minutes, with each ping having a second ping just 2 seconds later, so ping <--2 seconds--> ping <--118 secs--> ping <--2 seconds--> ping etc.

The results showed that the second ping result was much faster, because the ARP cache had already cached the ARP information (2 seconds earlier).

ping

So, to get the true & consistent ping value, it appears to be necessary to ping twice, and use the second value, disregarding the first.
This isn't a bug, it's just the way the network appears to work.

pingflow2

Yes & no... The ARP thing is true if you ping a host on your LAN that your own machine hasn't communicated with in a while. It's also true if you ping a host on a different LAN that the router over there hasn't talked to in a while. It's not true for something like 1.1.1.1 given the volume of traffic there. The initial delay probably has more to do with route caches, BGP, and other stuff along the way. Or perhaps your own machine hasn't talked to the local GW long enough to loose it's ARP entry (seems unlikely, though).

I'm not sure of the rationale for trying to use ping to hosts outside your subnet to mean anything other than that connection issues are not network transport issues between the two hosts.

Inside your subnet the latency has some utility, but for outside hosts it depends on things you can't control that can change at any time.

I've tried using pings to 1.1.1.1 to detect my IOT subnet has lost its ability to push messages with images to me via Email and send me a text message via a cellular modem so I could "fix it" ASAP. It basically was noting but false alarms as by time I got the message and was able to look into it the problem had resolved itself. Adding a text telling me that the connection returned just doubled the annoyance. So ultimately I gave up and quit paying for the cellular modem phone service.

While you can't control them, it is very important to know them. Variable latency is a clear sign of other issues.

For myself, I use nmap for scanning the local network. Usings the following BASH script, it calls back to Node-RED at the end, passing the XML details back via CURL to an http-in node. This is very effective and reliable and builds up a full picture of what connects to the network over time. It keeps me informed of current IP address assignments and when things were last seen.

#! /usr/bin/env bash
# Fast scan the local network for live devices and record
# to /tmp/nmap.xml which can be used in Node-RED
#
# To run manually:
#   sudo /home/home/nrmain/system/nmap_scan.sh
#
# To run via cron:
#   sudo crontab -e
#       01,16,31,46 * * * * /home/home/nrmain/system/nmap_scan.sh

# Run the scan
nmap -sn --oX /tmp/nmap.xml --privileged -R --system-dns --webxml 192.168.1.0/24
# Make sure ownership & ACLs on the output are secure
chown root:home /tmp/nmap.xml
chmod --silent 640 /tmp/nmap.xml
# Trigger the Node-RED update
#curl  --silent --output /dev/null 'http://localhost:1880/localnetscan' > /dev/null
curl --insecure -I 'https://localhost:1880/localnetscan'

However, for external checks, I use Telegraf. This has ping and DNS checks built in. I do pings against common endpoints such as google and youtube along with some of the intermediate points on my ISP's network so that I can see whether a problem is due to the endpoint (yes YouTube, I see your regular issues!) or my ISP. I also ping a couple of local addresses including the Router and NAS but that is simply so I get the comparison on my Grafana dashboard. If I see all the local pings go up, I know it is almost certainly a problem with my server.

I check all of the common public DNS servers so that I can see when an apparent performance issue is actually due to DNS rather than the network. This is more common than you might think. Also, the best DNS service is not always obvious. For me for example, Google DNS has caused major issues in the past with regular outages.

Incidentally, I also output the telegraf inputs to MQTT, not just InfluxDB.

Here are the relevant inputs:

# # Query given DNS server and gives statistics
[[inputs.dns_query]]
#   ## servers to query
  servers = ["192.168.1.1", "8.8.8.8", "8.8.4.4", "1.1.1.1", "1.0.0.1", "208.67.222.222", "208.67.220.220", "9.9.9.9", "94.140.14.14", "94.140.15.15"]

# # Returns ethtool statistics for given interfaces
[[inputs.ethtool]]
#   ## List of interfaces to pull metrics for
  interface_include = ["enp0s25","wlp3s0"]

[[inputs.net]]
#   ## By default, telegraf gathers stats from any up interface (excluding loopback)
#   ## Setting interfaces will tell it to gather these explicit interfaces,
#   ## regardless of status.
#   ##
  interfaces =  ["enp0s25","wlp3s0"]

# # Read TCP metrics such as established, time wait and sockets counts.
[[inputs.netstat]]

# # Collect kernel snmp counters and network interface statistics
[[inputs.nstat]]

# # Ping given url(s) and return statistics
[[inputs.ping]]
  interval = "60s"

#   ## Hosts to send ping packets to.
  urls = ["github.com","bbc.co.uk","amazon.co.uk","youtube.com","it.knightnet.org.uk","10.0.244.237", "216.66.82.69", "router.????????","pi2.????????","pi3.????????","192.168.1.???"]

#   ## Number of ping packets to send per interval.  Corresponds to the "-c"
#   ## option of the ping command.
  count = 4
#
#   ## Time to wait between sending ping packets in seconds.  Operates like the
#   ## "-i" option of the ping command.
  ping_interval = 1.0
#
#   ## If set, the time to wait for a ping response in seconds.  Operates like
#   ## the "-W" option of the ping command.
  timeout = 2.0
#   ## Interface or source address to send ping from.  Operates like the -I or -S
#   ## option of the ping command.
  interface = "enp0s25"
#

I'm using ping to try and check the state of my broadband connection.
Historically it's been very unreliable, and the providers have never managed to fix it despite numerous attempts.
Once I lose the connection, the only way to get it back is to disconnect/reconnect the phone line, or reboot the router. This then picks up a fresh path to the network, and all is again well.

An example is that I went on holiday 2 years ago (remember holiday's :rage:) and this happened 2 days into the holiday, meaning that I lost all contact with my home - alerts, lighting, security etc. for the remainder of the holiday.

So I purchased a decent router that has an api, and constructed a self-monitoring NR flow, that reboots the router if a certain criteria is met, which includes ping responses.
It's worked great, but decided to update the flow (always a bad idea when something already works!)

I now use the ping library in a function node to carry out the pings, which is much more configurable than the ping node, and I'm getting very consistent and stable results.

// ping library added in function setup tab
// https://www.npmjs.com/package/ping
let hosts = [];
hosts.push(msg.payload)
hosts.forEach(function (host) {
    ping.promise.probe(host, {
        timeout: 10,
        min_reply: 3,
        extra: ['-i', '2'],
    }).then(function (res) {
    msg.payload=res.min
    node.send(msg)
    });
});
1 Like

Jullian,

I'm trying to understand your bash script and have one question. How is the XML details passed to NR? Is it passing the actual data or are you using your knowledge of where the XML file is to use a node to go get the data?

nmap saves to /tmp/nmap.xml
In the flow, the http endpoint is triggered by the curl which then reads the XML file using file in node.

I.e. the data is not passed.

That's what I thought, it was just Jullians comment:

that had me confused and I wanted to know if there was some 'magic' I was missing :grin:

Thanks for assuring me that my brain is still working.

1 Like

Yes, apologies, in this case I just let Node-RED pick up the file. Though of course I could pass it back directly if I could be bothered to work out the curl command. But nmap creates the file so I might just as well let Node-RED do the work:

There is a separate flow that uses a uibuilder page to allow display and update of the table so that I can add descriptions to the data.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.