Problem with nodes when internet not working. Discussion

Folks, this is a weird/wild/rogue scenario that just happened to me and was really bad.

It is the unexpected things that happen that really test out setup and check if we did everything right.

This is one such case where it all fell apart - BADLY!

Back story

I'm not sure if it is laziness or learning/using technology.

I have the google voice node in/on ONE of my machines to give me notifications now and then.

I have PiHole as my DNS/DHCP serve.
(Another machine)

I also have a few instances of open-weather to either tell me the weather or get it and determine if it is worth watering the plants.

What happened

The other day I did something stupid and PiHole crashed.
As it is the DHCP and DNS that meant afore mentioned nodes couldn't Talk to daddy.

This - as I was told - was why te CPU load went from 12% to 70% and NR kept resetting.

A lot of messing around later

I started NR is safe mode and disabled the google voice command then DEPLOYED.

All was good again. Kind of.

I have built - yes, actually built - a second network so I could get help to find out what is/was happening and fix things: all be it slowly.

I have gone to all relevant machines and disabled the calling of such nodes.
But I admit I've left the google voice one DISABLED. Just to be sure.

What SAFEGUARDS can be put in place so if something CATASTROPHIC happens and nodes are needing (ne: DEMANDING) an uplink which is no longer there.......
They will really throw a tantrum and bring the computer down....
Well, make NR keep restarting every minute or so.

It would be nice if this was somehow automatically automated.

Thoughts?

Not sure if this could realistically be automated. There are just so many different potential issues and differing user requirements.

For me, if a node brought down a device, perhaps due to excessive CPU, or Node-RED as a whole, FOR ANY REASON - that would be a bug, it either must be fixed or I would stop using the node.

Lesson to learn: If using external services, run tests to see what happens when that service can't be reached.

If I really needed to rely on an external service and the node couldn't cut it, I would hand-craft a function node using an appropriate node.js library. I would include more error checks.

Do you really need a Google service to provide voice output? I'm sure I remember doing this with a local Linux tool years ago. Personally, I mostly don't want my server's making noises - that would be reserved for the server itself coming to a horrible end. :slight_smile:

Like I say - always test. Then if they do behave that way, it is a BUG that either needs fixing, or if the author can't/won't fix it, it has to go away.

@Trying_to_learn Sound advice from Julian above.

If you only have a problem with the TTS node (and you really want tts) then you have a few options.

  1. Open an issue with the developer on GitHub
  2. Try to mitigate the issue yourself, e.g. perhaps if the internet is down, don't send messages to the node.
  3. Look for an alternative node or other local option.

In short any nodes that require internet access should handle a failure gracefully, without you needing to do anything.

Yes to all the above. The wrinkle with TTS being where do you want the sound to come out? The server ? Or the browser ? If the server then you can possibly use things like espeak via exec. If via the browser then you may be able to use one of the audio nodes.

This comes from a very recent post of mine....

(Wave to @Sean-McG)

So here's the existing way I was thinking I would mitigate such failures:

Quick walk through the machines:
TimePi - NTP, Voice, open weather.
TelePi - overall monitor of all machines.
BedPi - DHCP/DNS server. Real world controller.

TimePi
Voice flow
wait node. 30 seconds delay all messages.
(stuff)
tts node
play sound voice - local speaker
end signal
loop back to wait node to send next message

(Else where on same flow)
status node watching the tts node
notes status.
switch node to detect problems (aka if NOT OK)
change node - sets things up
signals to wait node to dump all queued messages
blocks any more messages getting through.

Which I actually did test and it worked. In my given situation at the time.
Granted there is also the open weather node on that machine too.

So, I stupidly lost my DHCP and DNS machine. (ANOTHER machine)
So the network was DOWN.

The machines were sending through messages indicating this.
(Not too many. I have filter nodes to stop the same message coming through twice.)

So at this stage:

I saw the DHCP machine die. BedPi :frowning:
Then the machine with the voice load went from 12% to like 70%.
Not too bad, but worrying.
It stayed there and I WAS getting (voice) messages. So not sure how bad the crash was or not.
I quickly went in and tried to stop all the logging.
Suddenly on TimePi
no response from server
Soon after dashboard refresh. NR had restarted. :confused:
Started again, repeat.

Tried copying old versions to the machine. No change.

Steve then suggested disabling the tts node.
BINGO!
That was it.
(Lost a bit of work, but....)

Now the open weather node is only used (automatically) once a day.
In the morning to give the weather at that time.
It wasn't morning and wasn't being called.

I could post extracts from the voice flow if you want.
But I think the main part would be the failure part.
The rest is kind of all over the place because there are a few other things happen between the reception of the message and it going to the tts node.
Nothing critical to this.
(Ok, I can have the messages prefixed with a notification bell/sound. If it is detected, that sound is played prior to the voice being heard)