Exec node now throwing errors at me

(Look at bottom)

Sorry, I thought tonight was the ping node but it isn't.

Now it is an exec node.

This evening - just a short while ago I got this:

This is the best I can capture.

21:34:50

{"message":"Error: spawn ENOMEM","source":{"id":"3ca1bd56.5515aa","type":"ping","name":"Ping","count":1},"stack":"Error: spawn ENOMEM\n    at ChildProcess.spawn (internal/child_process.js:408:11)\n    at spawn (child_process.js:553:9)\n    at doPing (/home/pi/.node-red/node_modules/node-red-node-ping/88-ping.js:40:14)\n    at PingNode._inputCallback (/home/pi/.node-red/node_modules/node-red-node-ping/88-ping.js:120:36)\n    at /usr/lib/node_modules/node-red/node_modules/@node-red/runtime/lib/nodes/Node.js:203:26\n    at Object.trigger (/usr/lib/node_modules/node-red/node_modules/@node-red/runtime/lib/hooks.js:113:9)\n    at PingNode.Node._emitInput (/usr/lib/node_modules/node-red/node_modules/@node-red/runtime/lib/nodes/Node.js:195:11)\n    at PingNode.Node.emit (/usr/lib/node_modules/node-red/node_modules/@node-red/runtime/lib/nodes/Node.js:179:25)\n    at PingNode.Node.receive (/usr/lib/node_modules/node-red/node_modules/@node-red/runtime/lib/nodes/Node.js:476:10)\n    at Immediate._...

syslog for about that time:

Feb  6 21:32:39 TimePi Node-RED[4875]: 6 Feb 21:32:39 - [info] [ping:Ping] ping - Host '192.168.0.83' process timeout - sending SIGINT
Feb  6 21:32:39 TimePi Node-RED[4875]: 6 Feb 21:32:39 - [info] [ping:Ping] ping - Host '192.168.0.86' process timeout - sending SIGINT
Feb  6 21:32:39 TimePi Node-RED[4875]: 6 Feb 21:32:39 - [info] [ping:Ping] ping - Host '192.168.0.89' process timeout - sending SIGINT
Feb  6 21:32:39 TimePi Node-RED[4875]: 6 Feb 21:32:39 - [info] [ping:Ping] ping - Host '192.168.0.91' process timeout - sending SIGINT
Feb  6 21:32:39 TimePi Node-RED[4875]: 6 Feb 21:32:39 - [info] [ping:Ping] ping - Host '192.168.0.92' process timeout - sending SIGINT
Feb  6 21:34:46 TimePi Node-RED[4875]: 6 Feb 21:34:46 - [info] [ping:Ping] ping - Host '192.168.0.1' process timeout - sending SIGINT
Feb  6 21:35:37 TimePi Node-RED[4875]: 6 Feb 21:35:37 - [warn] [function:Er indicator] 0
Feb  6 21:39:17 TimePi kernel: [648611.732577] rtc rtc0: __rtc_set_alarm: err=-22
Feb  6 21:44:49 TimePi Node-RED[4875]: 6 Feb 21:44:48 - [info] [ping:Ping] ping - Host '192.168.0.1' process timeout - sending SIGINT
Feb  6 21:44:49 TimePi Node-RED[4875]: 6 Feb 21:44:48 - [info] [ping:Ping] ping - Host '192.168.0.21' process timeout - sending SIGINT
Feb  6 21:44:49 TimePi Node-RED[4875]: 6 Feb 21:44:48 - [info] [ping:Ping] ping - Host '192.168.0.34' process timeout - sending SIGINT
Feb  6 21:44:49 TimePi Node-RED[4875]: 6 Feb 21:44:48 - [info] [ping:Ping] ping - Host '192.168.0.82' process timeout - sending SIGINT
Feb  6 21:44:49 TimePi Node-RED[4875]: 6 Feb 21:44:48 - [info] [ping:Ping] ping - Host '192.168.0.83' process timeout - sending SIGINT
Feb  6 21:44:49 TimePi Node-RED[4875]: 6 Feb 21:44:48 - [info] [ping:Ping] ping - Host '192.168.0.86' process timeout - sending SIGINT
Feb  6 21:44:49 TimePi Node-RED[4875]: 6 Feb 21:44:48 - [info] [ping:Ping] ping - Host '192.168.0.89' process timeout - sending SIGINT
Feb  6 21:44:49 TimePi Node-RED[4875]: 6 Feb 21:44:48 - [info] [ping:Ping] ping - Host '192.168.0.91' process timeout - sending SIGINT
Feb  6 21:44:49 TimePi Node-RED[4875]: 6 Feb 21:44:48 - [info] [ping:Ping] ping - Host '192.168.0.92' process timeout - sending SIGINT
Feb  6 21:46:47 TimePi Node-RED[4875]: 6 Feb 21:46:47 - [info] [ping:Ping] ping - Host '192.168.0.1' process timeout - sending SIGINT
Feb  6 21:46:47 TimePi Node-RED[4875]: 6 Feb 21:46:47 - [info] [ping:Ping] ping - Host '192.168.0.21' process timeout - sending SIGINT
Feb  6 21:46:47 TimePi Node-RED[4875]: 6 Feb 21:46:47 - [info] [ping:Ping] ping - Host '192.168.0.34' process timeout - sending SIGINT
Feb  6 21:46:47 TimePi Node-RED[4875]: 6 Feb 21:46:47 - [info] [ping:Ping] ping - Host '192.168.0.82' process timeout - sending SIGINT
Feb  6 21:46:47 TimePi Node-RED[4875]: 6 Feb 21:46:47 - [info] [ping:Ping] ping - Host '192.168.0.83' process timeout - sending SIGINT
Feb  6 21:46:47 TimePi Node-RED[4875]: 6 Feb 21:46:47 - [info] [ping:Ping] ping - Host '192.168.0.86' process timeout - sending SIGINT
Feb  6 21:46:47 TimePi Node-RED[4875]: 6 Feb 21:46:47 - [info] [ping:Ping] ping - Host '192.168.0.89' process timeout - sending SIGINT
Feb  6 21:46:47 TimePi Node-RED[4875]: 6 Feb 21:46:47 - [info] [ping:Ping] ping - Host '192.168.0.91' process timeout - sending SIGINT
Feb  6 21:46:47 TimePi Node-RED[4875]: 6 Feb 21:46:47 - [info] [ping:Ping] ping - Host '192.168.0.92' process timeout - sending SIGINT
Feb  6 21:46:47 TimePi Node-RED[4875]: 6 Feb 21:46:47 - [info] [ping:Ping] ping - Host '192.168.0.93' process timeout - sending SIGINT
Feb  6 21:48:31 TimePi systemd[1]: Started Session c8 of user pi.
Feb  6 21:48:35 TimePi systemd[1]: fake-hwclock.service: Cannot add dependency job, ignoring: Unit fake-hwclock.service is masked.
Feb  6 21:48:35 TimePi systemd[1]: Stopping LSB: Start NTP daemon...

The pings are done every 20 seconds.

But this line kind of has my attention:

Feb 6 21:35:37 TimePi Node-RED[4875]: 6 Feb 21:35:37 - [warn] [function:Er indicator] 0

Alas soon there after another one happened.

This is the captured error message - I hope.

{"message":"Error: spawn ENOMEM","source":{"id":"596dfbde.39bc6c","type":"exec","name":"HWC time","count":1},"stack":"Error: spawn ENOMEM\n    at ChildProcess.spawn (internal/child_process.js:408:11)\n    at spawn (child_process.js:553:9)\n    at Object.execFile (child_process.js:237:17)\n    at exec (child_process.js:158:25)\n    at ExecNode._inputCallback (/usr/lib/node_modules/node-red/node_modules/@node-red/nodes/core/function/90-exec.js:134:29)\n    at /usr/lib/node_modules/node-red/node_modules/@node-red/runtime/lib/nodes/Node.js:203:26\n    at Object.trigger (/usr/lib/node_modules/node-red/node_modules/@node-red/runtime/lib/hooks.js:113:9)\n    at ExecNode.Node._emitInput (/usr/lib/node_modules/node-red/node_modules/@node-red/runtime/lib/nodes/Node.js:195:11)\n    at ExecNode.Node.emit (/usr/lib/node_modules/node-red/node_modules/@node-red/runtime/lib/nodes/Node.js:179:25)\n    at ExecNode.Node.receive (/usr/lib/node_modules/node-red/node_modules/@node-red/runtime/lib/nodes/Node.js:476:10)\n    at Immediate._onImmediate (/usr/lib/node_modules/node-red/node_modules/@node-red/runtime/lib/flows/Flow.js:657:5..."}

This one happened at 22:11 local.

From the syslog:

Feb  6 21:59:29 TimePi kernel: [649823.768575] rtc rtc0: __rtc_set_alarm: err=-22
Feb  6 22:00:54 TimePi ntpd[24732]: 129.250.35.251 local addr 192.168.0.99 -> <null>
Feb  6 22:03:01 TimePi ntpd[24732]: 103.214.220.220 local addr 192.168.0.99 -> <null>
Feb  6 22:04:12 TimePi ntpd[24732]: 27.124.125.250 local addr 192.168.0.99 -> <null>
Feb  6 22:11:14 TimePi Node-RED[4875]: 6 Feb 22:11:14 - [warn] [function:Er indicator] 0
Feb  6 22:17:01 TimePi CRON[30712]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Feb  6 22:29:01 TimePi Node-RED[4875]: 6 Feb 22:29:01 - [info] [ping:Ping] ping - Host '192.168.0.1' process timeout - sending SIGINT
Feb  6 22:29:01 TimePi Node-RED[4875]: 6 Feb 22:29:01 - [info] [ping:Ping] ping - Host '192.168.0.21' process timeout - sending SIGINT
Feb  6 22:29:01 TimePi Node-RED[4875]: 6 Feb 22:29:01 - [info] [ping:Ping] ping - Host '192.168.0.34' process timeout - sending SIGINT
Feb  6 22:29:01 TimePi Node-RED[4875]: 6 Feb 22:29:01 - [info] [ping:Ping] ping - Host '192.168.0.82' process timeout - sending SIGINT
Feb  6 22:29:01 TimePi Node-RED[4875]: 6 Feb 22:29:01 - [info] [ping:Ping] ping - Host '192.168.0.83' process timeout - sending SIGINT

and again at close to that time there is this line:

Feb 6 22:11:14 TimePi Node-RED[4875]: 6 Feb 22:11:14 - [warn] [function:Er indicator] 0

This is the node:

[{"id":"596dfbde.39bc6c","type":"exec","z":"e2bd5a4e.5597e8","command":"sudo hwclock -r","addpay":false,"append":"","useSpawn":"false","timer":"","oldrc":false,"name":"HWC time","x":660,"y":2450,"wires":[["f90ccbc3.25a8b","8601b654.799c6"],[],[]]}]

UPDATE:

This is beyond annoying now.

Ok, the first post of the error message: (extract)
"message":"Error: spawn ENOMEM","source":{"id":"3ca1bd56.5515aa","type

That is a PING node.

But somewhere in it I also see reference (or think I do) to an exec node.
So I took the post off my thread about the ping node and started this new one.

That is because soon after I got the second one and it clearly is the HW clock node.
{"message":"Error: spawn ENOMEM","source":{"id":"596dfbde.39bc6c","t

That is the node I posted.

I can understand it is annoying / frustrating to you who is reading this pathetic post of mine asking for help, but can you understand (please) my situation?

For a few days it has been behaving itself.
Then it spits the dummy on me.

I was away for a couple of weeks and errors happened.
(See other post here)

I don't understand the message. Yeah, ok. Maybe I should sit down and look at them.

But what I am pasting seems to be truncated, so it is kind of frustrating thinking "this is the entire story" when it isn't.

If you wouldn't mind having a look at the (error) message and tell me if there is something there which is telling me what is going on.

These errors seem to come in waves....
Theory as of now: possible lack of memory because something is eating up the available memory.

But I can't be sure.

From looking at the errors, it would appear that the issue IS with the ping node. That seems to be the one generating the error. Any reference to exec is likely to be from within the ping node as it possibly does an exec from within?

It would be much easier if you showed the Node-RED log output though, showing syslog output not terribly helpful.

I'd like to go back to basics though. Why are you doing a continuous ping? What benefit are you trying to get out of it? I really can't think of many reasons you would want to keep doing pings every 20sec permanently and even fewer to do them from within Node-RED itself.

If you want to have a continuous monitor of some network connections, there are likely to be more efficient, practical and robust methods. For myself, I use Telegraf for example which monitors all sorts of things and can output to InfluxDB and MQTT (and others as well but this is what I use). That gives me a nice view over time of any networking issues. But the pings from Telegraf are set to 1 ping, once a minute (I think, can't quite remember) as this is sufficient for general monitoring. Over pinging might even result in you being treated as a threat and either blocked or restricted.

The pings are done every 20 seconds to see If the other machine is still there/alive.

Yes, there is the snmp path, but this was done back in the early days before I knew of that.

20 seconds is nominal. But 5 minutes seems too long to detect if a machine has gone down.

I can only show the syslog because that is all I was shown to do.

The errors are on remote machines which are headless and I get a visual indication when there is an error.

Alas that means powering up the big machine, loading the browser and looking.
Not too much of a worry most of the time, but at the time it happened it was a bit annoying.
Especially when I am trying to chill out from the day's time in front of the screen.

(Sorry - I'm ranting)

Nowadays I am 2 weeks here 2 weeks away. It is very frustrating when I get back from a 2 week away and find there are errors all over the place.

YET! When I am (was) at home, the machines would play nicely with each other. Only when I am/were away would these things happen.

Can you show/explain to me how I get the node-red log?

/var/log/ doesn't have a nodered-log. Just a nodered-install.log

Thanks.

Have you got an MQTT broker on your central machine? If so make sure each pi has at least one node subscribed or publishing to that broker and use the LWT state to know whether each pi is still alive. Much easier than all that messing about with ping.

(Believe it or not, I do)

I have LWT detection (and "BS" - opposite of LWT)

Yes, I may need to change/improve how things are done.

But I'm interested/confused with this problem that seems to happen in waves.

Oh, and LWT stuff doesn't tell you if NR has hung or not.

I may have to sit down one day and re-think the entire methodology of how things are identified and do this sort of stuff.

That's in the future. But is a serious thing to be considered.

Have you considered an end-to-end test. Something we do with our PLCs (where communication is expected and MUST be running) is to setup a heartbeat in the ladder program that ultimately transmits a heartbeat value that is expected to be echoed back (otherwise we timeout & stop production). This ultimately tests the PLC is online (on the network), that the COMMS card is operational, that the datalinks are set correctly, that the PLC ladder is actually running - at both PLCs - a true end-to-end test.

In node-red land: assuming you needed to monitor comms between Machine 1 and Machine 2 - that could be something like...

  • Machine 1 subscribes to machine2/pong/machine1
  • Machine 2 subscribes to machine1/ping/machine2
  1. Machine 1 - increments a heartbeat value & publishes it to machine1/ping/machine2 (store value in context for later comparison)

  2. Machine 1 - begins timer (delay node)

  3. Machine 2 - sees machine1/ping/machine2 and immediately publishes the same value to machine2/pong/machine1

  4. Machine 1 - gets the heartbeat value from machine2/pong/machine1 and compares it to the heartbeat value sent (from context) - if matching reset delay node (success), if not, just wait until machine2/pong/machine1 either sends the correct the value or the delay times out - indicating a timeout fault (log it or whatever)

    you would do this for each machine (e.g. from Machine 2 --> Machine 1 as well).

While this wont tell you what exactly the problem is (e.g. if the heartbeat value is not returned it could be the MQTT broker is offline or the other machine is dead or node-red is not running) - you will know there is an issue & it will point you in the right direction.

Obviously this pseudo heartbeat example is off the top of my head and would need a little bit development to get right (and there are many ways to skin this cat) the point is... an end-to-end test avoids spawning ping and is far more reliable than ping alone (e.g. it tests network connectivity, ensures MQTT broker is alive, ensures node-red is running at both ends etc).

Hope that is of some use - or perhaps gives you a light bulb moment. :slight_smile:

3 Likes

That sounds nice.

The MQTT SOM and EOM (LWT) are not really doing what I want/need.

For testing the water, I have a machine running.

I unplug the Cat-5 cable. I don't see an EOM message.
(Sorry! (Typical) as soon as I said that, I got the EOM.)

I may pursue this way for now. The monitoring is star not mesh so I don't need all machines checking all other.
Just one checking all.

We should maybe do this in chat or in another thread so as to keep things tidy in what is talked about in this one.

@TotallyInformation

I remember why I use ping:
The router and uplink don't really have MQTT in them.

But I guess I could rationalise it down to only those two need to be pinged.

@Steve-Mcl
I do exactly this for ALL my x-numbers of machines, can just recommend this method, works so good
In addition, advanced path, just to mention,

  • if a timer would kick in, I use Telegram to inform me
  • all my monitored machines also have timers for "expected" receiving heartbeats and if those would trigger, they do as 1) restart running & related applications or services and if that doesn't help, a second timer kicks in and make a 2) reboot
  • at startup a monitored machine informs about it's status via Telegram and MQTT so I can keep full control of all machines & running services on those
1 Like

Neither does ping. If you need to check for that then do a regular publish from the pi and check it keeps coming in as expected.

Agreed.

But that is just part of the scheme.

Just now I am in the process of redesigning how it is done.

Alas it is huge. :frowning:
I am still getting the basic layout done.

Actually, checking my Telegraf config, it pings every 60s, sending 4 packets with a timeout of 2 seconds. The Grafana display was set to 5min updates but it doesn't have to be. Telegraf can also output direct to MQTT.

sudo journalctl -u node-red -f

Where node-red is the name of the systemd job that starts node-red on boot. All logs on Linux systems running systemd should be visible using that command, just replacing the name. You can easily add your own logs as well, if you dig out my post on running IMAPFILTER or NMAP, you will see examples, really useful for running CRON batch jobs and managing log output.

That's OK, I get that there is a need, I was just pointing out that there are better ways to fulfil that need than using Node-RED which might, itself, be part of an issue. For monitoring, it is better to use tools that are dedicated to that purpose.

On my systems, I use the home server to be the central hub. It runs Telegraf, InfluxDB, Grafana, Node-RED, Mosquitto. Telegraf pings a load of endpoints both inside and outside the network and records that to InfluxDB. So, if I can reach Grafana, I can see what is and isn't working.

But Ping only reaches the network card, it is no guarantee that the network is actually working or that the services above the network layer are actually doing anything.

To deal with that, I have every device and service that supports MQTT sending out keepalive pings around every 50 seconds (just under a minute). Each with an LWT. The pings send "Online" and the LWT sends "Offline". So that includes all of my home-made sensor platforms and node-red. In fact, node-red sends a number of regular pings, one for itself but others for services that don't directly support MQTT. Such as my RFXtrx433e.

In addition to the Telegraf pings, I also run a periodic NMAP scan of the network, the results of which are read by Node-RED and incorporated into my network device, and known devices tables (and corresponding MQTT topics). NMAP does a lot more than just a ping. In my case, it tracks the responsiveness of the device which can be useful when trying to track down a network problem. It also lets me find new devices on the network so that I can identify them.

With my router, WiFi AP and NAS, I've also experimented with SNMP monitoring. But as yet, I've not really had enough incentive to do that much. One of the big sticking points with SNMP and Node-RED is that some outputs from SNMP are big numbers in a format that JavaScript doesn't support (variable length numbers up to 64bits). I've not managed to work out how to decode them yet.

I also use Telegram for alerting. Though you might want to have two bots, a chatty one and a quiet one. The quiet one tells you just about really important things like the door-bell, or the front door being left open. The chatty one should also send out regular messages so that you will be able to see if things have suddenly gone quiet. So my chatty bot tells me when lights go on or off, if humidity goes outside normal ranges, when movement is detected, ...

Oh, at it is also worth knowing that Grafana has a Telegram alert extension. Very useful.

Grafana Telegram Alert (github.com)