How detect Z-Wave JS driver frozen/stuck

While I was playing with node-red-contrib-zwave-js, and based on my previous experience with certain packages that might end up letting Node-RED sort of stuck (I saw it with node-red-contrib-openzwave, when randomly the Z-Wave nodes weren't recognized my Node-RED in real time and I had no choice but to re-Deploy or restart the service), I decided to force some error situations to see what happened.

I've tried 2 cases related to potential problems that you may have with the USB stick and the idea is to find a way to externally identify the problem for example from a script that's being called periodically from cron and that would restart the service or report the problem somehow:

  • Start the service without the stick.

This is part of the output you can see by calling sudo systemctl status nodered:

[error] [zwave-js:Z-Wave JS Controller] ZWaveError: Failed to open the serial port: Error: No such file or directory, cannot open /dev/ttyACM0 (ZW0100)

It's easy to identify the problem externally.

  • Remove the stick in real time and plug it back again, assuming the driver will eventually recover on its own. I expected to see some sort of error message coming from the controller node at some point but it doesn't happen. All you get are sort of generic error messages that are generated only when you try to send requests to the network with the next format (again visible when you call sudo systemctl status nodered):

[error] [zwave-js:Z-Wave JS Controller] ZWaveError: Failed to send the message after 3 attempts (ZW0202)

I don't have enough experience with this module and I don't know if this kind of error message could be generated in case some devices failed randomly (while others are working fine, so we couldn't say that's a stick problem because it doesn't look like that), but my point is that I don't think that's the right way to detect the global stick problem that I'm talking about.

So, is there any way to get a well-known error message to identify this situation? I wouldn't mind if I have to add some extra stuff in the flow to save something in an external file to help the external script detect that something's wrong if that's the only way to do that.

Regards

Hi @Mamonetti

Set Pin 2 Logging to Error and connect the 2nd pin (it will appear) to a process of your choice.
Note Errors may just be informational and doesn't mean you need to react.

Your example above for example.

[error] [zwave-js:Z-Wave JS Controller] ZWaveError: Failed to send the message after 3 attempts (ZW0202)

Doesn't always mean its game over - if a stick is not performing then you will see this error, and the driver will re-attempt the next command sent, and the stick may this time be fine.

Ok, that's helpful indeed.

That's the point, some of these error messages could be temporary. Besides, as I mentioned in my previous post, the driver wasn't able to go back to normal operation even though I plugged the stick again. To be honest, I didn't check whether the driver had the /dev/ttyACM0 file locked and the stick received /dev/ttyACM1 after this second time (I'll check it next week just in case this is part of the problem here), but even if that's the case, it doesn't mean I can really detect a critical situation (the game over you mentioned).

Regards

My module has recovery - but it needs work.
the driver has no recovery - its down to the front end to implement it - whilst I do, it's not quite up to standards I would like.

/dev/ttyACM0

I don't recommend using this path. - they can change after reboot.
use device by-id

use one of the files in: /dev/serial/by-id these are not known to change.
they are symlinks to serial devices.

you can use the custom option on the controller node

Yeah, that's what I thought.

First time I hear about /dev/serial/by-id but it's good to know, thx :slightly_smiling_face:

Regards

Hi @Mamonetti,

V7 has a much improved recovery mechanism. that will auto restart the driver (see image)
this will kick in if we receive a fatal Driver Error (i.e the stick is pulled)

it will retry every 15s until the driver was able to start again.

That's good to hear, thx.

Regards