How to find out what causes Node-RED to crash

I'm running Node-RED on a RaspberryPi to manage a Home Energy Management System.
It crashes infrequently (approx once per 10 days) but always at night between around 03:30.
The rest of the time the system works without any issues.

The RaspberryPi syslog give the following information around the time NR crashes:

Mar 11 03:29:46 raspberrypi Node-RED[437]: 11 Mar 03:29:46 - [info] [mqtt-broker:Local server] Connected to broker: mqtt://192.168.1.11:1883 
Mar 11 03:29:44 raspberrypi avahi-daemon[421]: No service file found in /etc/avahi/services. 
Mar 11 03:29:44 raspberrypi avahi-daemon[421]: Got SIGHUP, reloading. 
Mar 11 03:29:44 raspberrypi dhcpcd[808]: eth0: adding default route via 192.168.0.1 
Mar 11 03:29:44 raspberrypi avahi-daemon[421]: Registering new address record for 192.168.1.11 on eth0.IPv4. 
Mar 11 03:29:44 raspberrypi dhcpcd[808]: eth0: adding route to 192.168.0.0/24 
Mar 11 03:29:44 raspberrypi avahi-daemon[421]: New relevant interface eth0.IPv4 for mDNS. 
Mar 11 03:29:44 raspberrypi avahi-daemon[421]: Joining mDNS multicast group on interface eth0.IPv4 with address 192.168.1.11. 
Mar 11 03:29:44 raspberrypi dhcpcd[808]: eth0: leased 192.168.1.11 for 86400 seconds 
Mar 11 03:29:38 raspberrypi dhcpcd[808]: eth0: probing address 192.168.1.11/24 
Mar 11 03:29:35 raspberrypi dhcpcd[808]: eth0: rebinding lease of 192.168.1.11 
Mar 11 03:29:34 raspberrypi kernel: [...] bcmgenet fd580000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx 
Mar 11 03:29:34 raspberrypi dhcpcd[808]: eth0: carrier acquired 
Mar 11 03:29:30 raspberrypi Node-RED[437]: } 
Mar 11 03:29:30 raspberrypi Node-RED[437]: hostname: 'eu-central-1-1.aws.cloud2.influxdata.com' 
Mar 11 03:29:30 raspberrypi Node-RED[437]: syscall: 'getaddrinfo', 
Mar 11 03:29:30 raspberrypi Node-RED[437]: code: 'EAI_AGAIN', 
Mar 11 03:29:30 raspberrypi Node-RED[437]: errno: -3001, 
Mar 11 03:29:30 raspberrypi Node-RED[437]: at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:109:26) { 
Mar 11 03:29:30 raspberrypi Node-RED[437]: WARN: Write to InfluxDB failed (attempt: 5). Error: getaddrinfo EAI_AGAIN eu-central-1-1.aws.cloud2.influxdata.com 
Mar 11 03:29:01 raspberrypi Node-RED[437]: 11 Mar 03:29:01 - [info] [mqtt-broker:Local server] Disconnected from broker: mqtt://192.168.1.11:1883 
Mar 11 03:28:40 raspberrypi Node-RED[437]: } 
Mar 11 03:28:40 raspberrypi Node-RED[437]: hostname: 'eu-central-1-1.aws.cloud2.influxdata.com' 
Mar 11 03:28:40 raspberrypi Node-RED[437]: syscall: 'getaddrinfo', 
Mar 11 03:28:40 raspberrypi Node-RED[437]: code: 'EAI_AGAIN', 
Mar 11 03:28:40 raspberrypi Node-RED[437]: errno: -3001, 
Mar 11 03:28:40 raspberrypi Node-RED[437]: at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:109:26) { 
Mar 11 03:28:40 raspberrypi Node-RED[437]: WARN: Write to InfluxDB failed (attempt: 4). Error: getaddrinfo EAI_AGAIN eu-central-1-1.aws.cloud2.influxdata.com 
Mar 11 03:28:33 raspberrypi kernel: [...] bcmgenet fd580000.ethernet eth0: Link is Down 
Mar 11 03:28:33 raspberrypi dhcpcd[808]: eth0: carrier lost 
Mar 11 03:28:30 raspberrypi dhcpcd[808]: eth0: rebinding lease of 192.168.1.11 
Mar 11 03:28:30 raspberrypi kernel: [...] bcmgenet fd580000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off 
Mar 11 03:28:30 raspberrypi dhcpcd[808]: eth0: carrier acquired 
Mar 11 03:28:12 raspberrypi Node-RED[437]: } 
Mar 11 03:28:12 raspberrypi Node-RED[437]: hostname: 'eu-central-1-1.aws.cloud2.influxdata.com' 
Mar 11 03:28:12 raspberrypi Node-RED[437]: syscall: 'getaddrinfo', 
Mar 11 03:28:12 raspberrypi Node-RED[437]: code: 'EAI_AGAIN', 
Mar 11 03:28:12 raspberrypi Node-RED[437]: errno: -3001, 
Mar 11 03:28:12 raspberrypi Node-RED[437]: at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:109:26) { 
Mar 11 03:28:12 raspberrypi Node-RED[437]: WARN: Write to InfluxDB failed (attempt: 3). Error: getaddrinfo EAI_AGAIN eu-central-1-1.aws.cloud2.influxdata.com 
Mar 11 03:28:09 raspberrypi Node-RED[437]: 11 Mar 03:28:09 - [warn] [modbus-read:Meter2] Modbus Failure On State sending Get More About It By Logging 
Mar 11 03:28:09 raspberrypi Node-RED[437]: 11 Mar 03:28:09 - [warn] [modbus-read:Meter Event] Modbus Failure On State sending Get More About It By Logging 
Mar 11 03:28:09 raspberrypi Node-RED[437]: 11 Mar 03:28:09 - [warn] [modbus-read:Meter] Modbus Failure On State sending Get More About It By Logging 
Mar 11 03:28:09 raspberrypi Node-RED[437]: 11 Mar 03:28:09 - [warn] [modbus-read:Solar] Modbus Failure On State sending Get More About It By Logging 
Mar 11 03:28:00 raspberrypi Node-RED[437]: } 
Mar 11 03:28:00 raspberrypi Node-RED[437]: hostname: 'eu-central-1-1.aws.cloud2.influxdata.com' 
Mar 11 03:28:00 raspberrypi Node-RED[437]: syscall: 'getaddrinfo', 
Mar 11 03:28:00 raspberrypi Node-RED[437]: code: 'EAI_AGAIN', 
Mar 11 03:28:00 raspberrypi Node-RED[437]: errno: -3001, 
Mar 11 03:28:00 raspberrypi Node-RED[437]: at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:109:26) { 
Mar 11 03:28:00 raspberrypi Node-RED[437]: WARN: Write to InfluxDB failed (attempt: 2). Error: getaddrinfo EAI_AGAIN eu-central-1-1.aws.cloud2.influxdata.com 
Mar 11 03:27:57 raspberrypi avahi-daemon[421]: No service file found in /etc/avahi/services. 
Mar 11 03:27:57 raspberrypi avahi-daemon[421]: Got SIGHUP, reloading. 
Mar 11 03:27:57 raspberrypi dhcpcd[808]: eth0: deleting default route via 192.168.0.1 
Mar 11 03:27:57 raspberrypi avahi-daemon[421]: Interface eth0.IPv4 no longer relevant for mDNS. 
Mar 11 03:27:57 raspberrypi avahi-daemon[421]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.1.11. 
Mar 11 03:27:57 raspberrypi dhcpcd[808]: eth0: deleting route to 192.168.0.0/24 
Mar 11 03:27:57 raspberrypi avahi-daemon[421]: Withdrawing address record for 192.168.1.11 on eth0. 
Mar 11 03:27:57 raspberrypi kernel: [...] bcmgenet fd580000.ethernet eth0: Link is Down 
Mar 11 03:27:57 raspberrypi dhcpcd[808]: eth0: carrier lost 
Mar 11 03:27:54 raspberrypi Node-RED[437]: } 
Mar 11 03:27:54 raspberrypi Node-RED[437]: hostname: 'eu-central-1-1.aws.cloud2.influxdata.com' 
Mar 11 03:27:54 raspberrypi Node-RED[437]: syscall: 'getaddrinfo', 
Mar 11 03:27:54 raspberrypi Node-RED[437]: code: 'EAI_AGAIN', Mar 11 03:27:54 raspberrypi Node-RED[437]: errno: -3001, 
Mar 11 03:27:54 raspberrypi Node-RED[437]: at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:109:26) { 
Mar 11 03:27:54 raspberrypi Node-RED[437]: WARN: Write to InfluxDB failed (attempt: 1). Error: getaddrinfo EAI_AGAIN eu-central-1-1.aws.cloud2.influxdata.com

I pull data from Inverter via modbus every 10s and push these to InfluxDB and I suspect the issue is on somehow due to a short drop of the connection to Internet but that should not cause a problem for the modbus reading that is done locally.
Any ideas on what the cause can be and how to dig deeper?

The log also mentions "Modbus Failure On State sending Get More About It By Logging".
Does anyone know how to set up that up?

Your log appears to be backwards, which is a bit odd, but I presume that is something to do with how you are presenting it.

I don't see any evidence that node red is crashing there. However, as you can see, your network connection is failing, which is giving the errors in node-red.

1 Like

Yes, the backward presentation of the log is due to pulling it remotely through Nord-RED by a "tail" command. I´ll remember to reverse it next time.

Is there any other log that could give more detail on the reason for NR crashing?

Does your router automatically reboot every night? This is common with mobile broadband routers.

Does Node-red restart after it crashes?

On one of my Pies Node-red logs to /var/log/daemon.log, on another to /var/log/Node-RED.log. No idea where it gets the file name from.
You might find more information in one or other of these files.

You can increase Node-red's logging level by editing settings.js.
Changing it from "info" to "debug" substantially increases the log output.

1 Like

It looks like you are having some network issues

Mar 11 03:27:54 raspberrypi Node-RED[437]: WARN: Write to InfluxDB failed (attempt: 1). Error: getaddrinfo EAI_AGAIN eu-central-1-1.aws.cloud2.influxdata.com
Mar 11 03:27:57 raspberrypi kernel: [...] bcmgenet fd580000.ethernet eth0: Link is Down 
Mar 11 03:27:57 raspberrypi avahi-daemon[421]: Withdrawing address record for 192.168.1.11 on eth0. 
Mar 11 03:27:57 raspberrypi dhcpcd[808]: eth0: deleting route to 192.168.0.0/24 
Mar 11 03:27:57 raspberrypi avahi-daemon[421]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.1.11. 
Mar 11 03:27:57 raspberrypi avahi-daemon[421]: Interface eth0.IPv4 no longer relevant for mDNS. 
Mar 11 03:27:57 raspberrypi dhcpcd[808]: eth0: deleting default route via 192.168.0.1 
Mar 11 03:27:57 raspberrypi avahi-daemon[421]: Got SIGHUP, reloading. 
Mar 11 03:27:57 raspberrypi avahi-daemon[421]: No service file found in /etc/avahi/services. 
Mar 11 03:28:00 raspberrypi Node-RED[437]: WARN: Write to InfluxDB failed (attempt: 2). Error: getaddrinfo EAI_AGAIN eu-central-1-1.aws.cloud2.influxdata.com 
Mar 11 03:28:00 raspberrypi Node-RED[437]: at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:109:26) { 
Mar 11 03:28:00 raspberrypi Node-RED[437]: errno: -3001, 
Mar 11 03:28:00 raspberrypi Node-RED[437]: code: 'EAI_AGAIN', 
Mar 11 03:28:00 raspberrypi Node-RED[437]: syscall: 'getaddrinfo', 
Mar 11 03:28:00 raspberrypi Node-RED[437]: hostname: 'eu-central-1-1.aws.cloud2.influxdata.com' 
Mar 11 03:28:00 raspberrypi Node-RED[437]: } 

No automatic reboot on the router, the problem happens with 8-11 days interval.
Node-Red automatically restarts and send a "Hello from Local node-RED" though MQTT server.

I'll have look at the deamon.log but I don't have a Node-RED.log, I believe it uses syslog instead.
Next time I'm on site I'll modify the settings.js file for more logging data.

Thanks for your help

Agree, and I'd like to find a way for NR not to crash if the Network drops.
I can´t recreate the issue by disconnecting the network so there is something more.

I haven't seen any evidence yet that it is crashing. If it automatically restarts then search syslog for the welcome to node red message, then you should see the reason that it restarts. If it is not restarting then this should show you the last 50 messages from node-red
journalctl -u nodered -n 50

Most likely this is a LWT message (or retained message, or "on connect" message) - it does not indicate Node-RED crashed.

What is more likely is a UDP/TCP bound socket (in the influx nodes) is hung because of your network outage.

I would look at the readmes and issues for that particular node - see if there is a means to reinitialise its comms.

1 Like

I agree, this might be an "on connect" message in MQTT rather than indicative of Node-red restarting.
One way to confirm that Node-red has actually restarted is to include it's process id in the "Hello from Node-red" message.

You can obtain it via an exec node to run

ps -ef | grep node-red | grep -v grep | awk '{print $2}'

It might be worth including the result of uptime -p too

1 Like

Thanks I'll review the InfluxDB node readme's and also do some test by deactivating it.

You right in that I'm confusing a Node-RED crash/restart with a Disconnected/Reconnected MQTT broker. The uptime command returned "up 4 weeks, 2 days, 4 hours, 39 minutes".

Running the ps command on an exec node I get the following response:

437
23904
23905
23907
23908

Not sure what the numbers mean :slight_smile:

So now I need to figure out why the Mosquito broker disconnects as all MQTT messaging is done locally.

Are you specifying an IP address in the mqtt nodes? If so, try localhost.

1 Like

So the Pi itself has not rebooted.

I don't know the significance of multiple occurrences of "node-red" in your ps -ef output. I only see 1 on my Pi

I've set an IP for the local server in the mqtt nodes as the Raspberry has a fixed IP
I'll test with localhost instead to see what happens.

I rebooted the Raspberry Pi and now only have one occurrence from the ps -ef output.

I think it doesn't have an IP address at all if the network is down.

I have just done some tests to confirm, and using the IP address fails if the network is down (which it is as can be seen from the log you posted). localhost should keep working.

2 Likes

Thanks @Colin for taking time to test this.
I've switched to localhost and learnt a lot for these exchanges!

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.