While you can't control them, it is very important to know them. Variable latency is a clear sign of other issues.
For myself, I use nmap for scanning the local network. Usings the following BASH script, it calls back to Node-RED at the end, passing the XML details back via CURL to an http-in node. This is very effective and reliable and builds up a full picture of what connects to the network over time. It keeps me informed of current IP address assignments and when things were last seen.
#! /usr/bin/env bash
# Fast scan the local network for live devices and record
# to /tmp/nmap.xml which can be used in Node-RED
#
# To run manually:
# sudo /home/home/nrmain/system/nmap_scan.sh
#
# To run via cron:
# sudo crontab -e
# 01,16,31,46 * * * * /home/home/nrmain/system/nmap_scan.sh
# Run the scan
nmap -sn --oX /tmp/nmap.xml --privileged -R --system-dns --webxml 192.168.1.0/24
# Make sure ownership & ACLs on the output are secure
chown root:home /tmp/nmap.xml
chmod --silent 640 /tmp/nmap.xml
# Trigger the Node-RED update
#curl --silent --output /dev/null 'http://localhost:1880/localnetscan' > /dev/null
curl --insecure -I 'https://localhost:1880/localnetscan'
However, for external checks, I use Telegraf. This has ping and DNS checks built in. I do pings against common endpoints such as google and youtube along with some of the intermediate points on my ISP's network so that I can see whether a problem is due to the endpoint (yes YouTube, I see your regular issues!) or my ISP. I also ping a couple of local addresses including the Router and NAS but that is simply so I get the comparison on my Grafana dashboard. If I see all the local pings go up, I know it is almost certainly a problem with my server.
I check all of the common public DNS servers so that I can see when an apparent performance issue is actually due to DNS rather than the network. This is more common than you might think. Also, the best DNS service is not always obvious. For me for example, Google DNS has caused major issues in the past with regular outages.
Incidentally, I also output the telegraf inputs to MQTT, not just InfluxDB.
Here are the relevant inputs:
# # Query given DNS server and gives statistics
[[inputs.dns_query]]
# ## servers to query
servers = ["192.168.1.1", "8.8.8.8", "8.8.4.4", "1.1.1.1", "1.0.0.1", "208.67.222.222", "208.67.220.220", "9.9.9.9", "94.140.14.14", "94.140.15.15"]
# # Returns ethtool statistics for given interfaces
[[inputs.ethtool]]
# ## List of interfaces to pull metrics for
interface_include = ["enp0s25","wlp3s0"]
[[inputs.net]]
# ## By default, telegraf gathers stats from any up interface (excluding loopback)
# ## Setting interfaces will tell it to gather these explicit interfaces,
# ## regardless of status.
# ##
interfaces = ["enp0s25","wlp3s0"]
# # Read TCP metrics such as established, time wait and sockets counts.
[[inputs.netstat]]
# # Collect kernel snmp counters and network interface statistics
[[inputs.nstat]]
# # Ping given url(s) and return statistics
[[inputs.ping]]
interval = "60s"
# ## Hosts to send ping packets to.
urls = ["github.com","bbc.co.uk","amazon.co.uk","youtube.com","it.knightnet.org.uk","10.0.244.237", "216.66.82.69", "router.????????","pi2.????????","pi3.????????","192.168.1.???"]
# ## Number of ping packets to send per interval. Corresponds to the "-c"
# ## option of the ping command.
count = 4
#
# ## Time to wait between sending ping packets in seconds. Operates like the
# ## "-i" option of the ping command.
ping_interval = 1.0
#
# ## If set, the time to wait for a ping response in seconds. Operates like
# ## the "-W" option of the ping command.
timeout = 2.0
# ## Interface or source address to send ping from. Operates like the -I or -S
# ## option of the ping command.
interface = "enp0s25"
#