By now my node-red application is fairly complex. I have 19 tabs each with many nodes and all with some form of unique function coding, many that are extensive. While it seems to run as expected for lengthy periods (days, weeks), at some point something, somewhere, gets to a point where it drives NR to become unhinged. The CPU usage goes up from a few % to 80% or more and memory usage maxes out. Eventually the application just stops functioning and sometimes stopping it and getting it restarted is a multi-step, iterative process before success. (Note: Node Red does not necessarily terminate, it most often will lock up, though sometimes it terminates). I'm having a challenge finding the culprit here.
As a side note I do use the exec node a fair amount to launch Cepstrals "Swift" to initiate speech or "omxplayer" to launch an mp3 file.
I access and program the Rpi3 remotely, either via HTTP (editor) or using VNC. When it goes haywire I start to have challenges accessing it by either of these methods (likely due to RAM problem). It is not convenient at all to set up a monitor, keyboard / mouse where the device resides.
What I wish to do (having just set settings.js to level: "trace") is have the console log append a file on the USB drive connected so I can review it whenever the issue reoccurs. So far when I can catch the log before everything gets too wonky the last set of messages have not been helpful to identify what triggered the abnormality (however that was with level: "info" ).
Having reviewed the node.js command line options I tried to implement node --report-on-fatalerror and node --report-directory="mydir" however these both produced the response:
> node: --report-on-fatalerror option is valid only when --experimental-report is set
node --experimental-report=true produces this (which I'm not sure how to make use of for my intentions):
> Welcome to Node.js v12.16.2. > Type ".help" for more information. > >
Why wouldn't the USB drive suffice? If it happens to be a singe part/node causing the issues that can be fixed, a temporary solution like that might be easier to setup.
It would - but what i suggested is pretty quick and easy (as long as you have other machines) and means you can watch the logs in realtime (if you wish)
Thanks for the feedback, each of you. I do have a Synology NAS that I could stream the file to, I might do that. I have been able to run TOP using Termius and it is absolutely node red that is unhappy. It shows memory use of over 180% with high CPU. Often the node-red-stop command will not work at all and I have to kill the session. I don't like doing that though because then start up is even more challenging (I think it might be due to context storage issues).
I do have MQTT nodes (broker is on the NAS) but no wildcards. I run eWelink as well and it has challenges with unmet promises however I've come to realize it's not the problem (I disabled the tab and the issue still eventually presents). The latest "test" I'm engaged in was to disable the security (alarm) system tab and see if that provides any indication. There are loops here, for exiting, monitoring and entering.
I'll try the suggested pipe command and see how it works out. Maybe a bit before I know since the issue can be so illusive.
The next time it happens use another client to subscribe to '#' on the MQTT broker and see if there is an MQTT loop. Alternatively stop the MQTT broker and see if that fixes it.
However if node-red-stop has problems that sounds more like a tight loop somewhere so it is not able to respond to the close signal. If you left it a few minutes though I would expect systemd to eventually get bored and kill it off.
If killing it makes it difficult to restart then that may provide a clue. If you have to kill it then start it using node-red-start and post the log here. In fact it might be worth posting the log here even before it fails in case we can see anything suspicious.
Edit Also, before killing it run node-red-log to see what is going on.
Colin I have the console log up in a window always. I see now the problem is the pi is running out of memory causing NR to terminate. I have no idea what is causing the RAM problems but I am slowly disabling each flow until I find the culprit. Here is the latest log capture I grabbed on the most recent NR unplanned termination.
19 Jun 07:21:26 - [debug] Flushing localfilesystem context scope 1428bc74.610ad4
<--- Last few GCs --->
[10417:0x3b33f40] 163761797 ms: Mark-sweep 222.3 (228.9) -> 222.2 (228.9) MB, 1172.0 / 0.1 ms (average mu = 0.703, current mu = 0.004) last resort GC in old space requested
[10417:0x3b33f40] 163762845 ms: Mark-sweep 222.2 (228.9) -> 222.2 (228.9) MB, 1048.1 / 0.2 ms (average mu = 0.549, current mu = 0.000) last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x42b8e3ad <JSObject>
0: builtin exit frame: byteLength(aka byteLengthUtf8)(this=0x30226d81 <Object map = 0x403d4b05>,0x3dc2b315 <Very long string[3333833]>,0x30226d81 <Object map = 0x403d4b05>)
1: from [0x57d88b3d] [buffer.js:~300] [pc=0x388f0448](this=0x30216cc9 <JSFunction Buffer (sfi = 0x57d82779)>,0x3dc2b315 <Very long string[3333833]>,0x57d81ff9 <String[#4]: utf8>,0x4fc00279 <undefined>)
2: writeFd(aka...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
Here is a typical RAM behavior view:
Sometimes the RAM Usage "Last 24 Hrs" is even more up and down then this.
Yes to both. Mp3 files are very small, just a bing or a bong (< 2 secs). Max is 10 seconds long but is called very seldomly. Cepstral Swift I call via an exec node by sending it (msg.payload) of what the computer needs to say. Other then a weather forecast (once per day) these are less than 5 second voice msgs. I have learned to watch swift and memory use closely. As a side note, NR has failed when I've been away for a week and no voice messages would be processed, which is why I went looking elsewhere for the culprit.
I also have an Arduino NANO .that monitors the solar battery bank voltage, battery current flow (+ or -) and commercial hydro sense. These provide an update to the rpi every 1/2 second via USB.
I don't know the full significance of the crash log, but it looks rather as if it is trying to allocate 3MB.
Have you checked that the memory is stable if you don't have node-red running?
That seems a little over the top since you only have a few hundred pixels across the screen, which means you have several tens of times more samples than you need.
I would start by dramatically reducing the rate at which data is added to the chart.
Are you running the browser on the Pi too? If so then what happens if you close and re-open the browser?
Also are you running the latest version of node-red and node-red-dashboard? If not then worth upgrading I think, if nothing else then just to eliminate that from the equation.
I don't know how many pixels the chart node would be using for the 24 hr line chart. No problem to reduce the sample rate for sure.
NR and Dashboard are the latest versions and the browser is running on my laptop, not on the Rpi3.
I think I may have found the culprit as well. I have an AC voltage sensor board (is commercial hydro okay?) feeding an Arduino NANO board that uses USB to send the resultant data to the Rpi. It was providing around 12 samples a second all the time through the USB port (desired to measure glitch durations). I rewrote the code on the NANO (and my NR function) to only update on a change of AC status. Detection by the NANO and reaction of the NR function is fast enough to still keep the measurement reasonably accurate. If the "glitch" lasts more than 10 seconds it becomes an outage which then causes various things to happen. I'm not entirely sure this was it, I'll have to see after a reasonable period of time if the problem re-occurs.