Node-red crash with "node-red-node-tail"

Hi, I'm trying to build an industrial application to collect data from 20 machines in a production line for a medical device manufacturer company, each machine has a Windows PC with a "Log.txt" where the last line of this file contains the current machine status, reading with the "node-red-node-tail" v0.4.0, then a Function node is used to decode the message and report machine running, stop, Lot Number, operator, etc. in the dashboard. The server is also a Windows 10 PC with Node-red v3.0.2.

Everything work fine for few hours and then Node-red crashes, every time I run, with this error:

27 Sep 09:16:14 - [red] Uncaught Exception:
27 Sep 09:16:14 - [error] Error: ECONNRESET: connection reset by peer, watch
at FSEvent.FSWatcher._handle.onchange (node:internal/fs/watchers:207:21)
C:\WINDOWS\system32>

Question, how can I troubleshoot or correct this error?

Thank you for the help!

How big is the file? I see that you are accessing the file on what is presumably a shared network drive (Y:)? Is that connection reliable? What device is that drive on - is that device reliable?

Also, does the connection recover? If so, I think you can work around the issue.

Whatever the cause (which is likely a network or device problem as @TotallyInformation suggests) it should not crash node red. Please upgrade to node-red 3.1.0 and make sure it still happens (which I expect it will) then submit an issue describing the problem at Issues · node-red/node-red · GitHub

TotallyInformation Thank you for the replay. We have 20 PCs (Windows 10) on an industrial wired network (1 GB, very reliable) with shared network drives. The "Log.txt" file is 416KB (6,561 lines of short messages). We don't have any issue at all with the computers or network, I always can see and access the shared drives. After the Node-red crashes, can restart it and everything comes back to live for few hours, then stop again.

Thank you!

Colin Thank you for the response, will try to upgrade to the 3.1.0 version and report the findings, but as you mention, I think will be the same.

Have you configured the node red service to auto-restart on crashing? That would at least ensure the system starts up again automatically.

There appears to be an outstanding bug that may be related to this:

The issue is a crash of the tail.js external library. However, the node's runtime should have a try/catch around the use of Tail:

So that is a bug anyway and should be reported to that repo.

Of course, that does not resolve the issue completely. However, it might let you use a catch node to simply trap the error and wait instead for the next change after which might work unless some kind of network error is generated.

Colin Not sure how to do that, is it a node red service configuration or the operating system?

TotallyInformation Thank you! Do you suggest to add those lines to my existing node tail node code? No sure here.. node-red/node-red-nodes/blob/master/storage/tail/28-tail.js#L20-L25

Well that would be only a temporary work around. It needs raising as a bug on that repo for someone to fix properly.

Neither do I, I don't use Windows. I expect it is a configuration setting in whatever system you have used to run node red automatically on boot.

We would need to know how Node-RED is being started on the PC. On Windows there are various ways to run things and each has their own way of auto-restart.

Might also be good to look at what kind of drive, Y: actually is. Is it a network drive or some kind of external drive or something else. It may be that is causing the accessibility blip. Not possible to know without knowing more about the architecture in use.

I'm staring up the node red service manually, no auto start at the moment, will try that once find a solution for the crashing issue.

As mentioned, the basic architecture is 1 PC with node red as server and 20 PCs with a shared drive over a wired LAN, all using Windows 10, each with a "Log.txt" file we need to read the tail. The shared drives of the 20 PCs is where the OS live (C:).

I think I might try accessing the log files from the PC's C: drive and manually copy from the network drives - just to try and eliminate another variable.