Hmmm, Node-RED grinds to a halt and is restarting. Any Ideas?

Seeing this in the logs, or at least, this is the first one I caught!...

<--- Last few GCs --->
[3690:0x7f9d100000] 799596 ms: Mark-Compact 1005.1 (1073.6) -> 989.6 (1071.6) MB, pooled: 1 MB, 578.63 / 0.00 ms (average mu = 0.467, current mu = 0.434) allocation failure; scavenge might not succeed
[3690:0x7f9d100000] 800337 ms: Mark-Compact 998.8 (1071.9) -> 989.8 (1072.6) MB, pooled: 0 MB, 535.35 / 0.00 ms (average mu = 0.384, current mu = 0.278) allocation failure; scavenge might not succeed
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----
1: 0xe15218 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node-red]
2: 0x11ab5cc v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node-red]
3: 0x11ab77c v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node-red]
4: 0x13d137c [node-red]
5: 0x13e92f0 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node-red]
6: 0x13bfc78 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node-red]
7: 0x13c0ab0 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node-red]
8: 0x139b1b0 v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node-red]
9: 0x13893f8 v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Tagged<v8::internal::Map>, v8::internal::AllocationAlignment) [node-red]
10: 0x138a90c v8::internal::FactoryBase<v8::internal::Factory>::NewRawOneByteString(int, v8::internal::AllocationType) [node-red]
11: 0x138a99c v8::internal::FactoryBase<v8::internal::Factory>::NewStringFromOneByte(v8::base::Vector<unsigned char const>, v8::internal::AllocationType) [node-red]
12: 0x14fd7bc v8::internal::JsonStringifier::Stringify(v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>) [node-red]
13: 0x14fd95c v8::internal::JsonStringify(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>) [node-red]
14: 0x12263b4 v8::internal::Builtin_JsonStringify(int, unsigned long*, v8::internal::Isolate*) [node-red]
15: 0x1cc8bf4 [node-red]
nodered.service: Main process exited, code=killed, status=6/ABRT
nodered.service: Failed with result 'signal'.
nodered.service: Consumed 15min 27.649s CPU time.
nodered.service: Scheduled restart job, restart counter is at 2.
Stopped nodered.service - Node-RED graphical event wiring tool.
nodered.service: Consumed 15min 27.649s CPU time.
Started nodered.service - Node-RED graphical event wiring tool.
26 May 18:10:48 - [info]
Welcome to Node-RED
===================
26 May 18:10:48 - [info] Node-RED version: v4.1.10
26 May 18:10:48 - [info] Node.js version: v22.22.2
26 May 18:10:48 - [info] Linux 6.12.87+rpt-rpi-v8 arm64 LE
26 May 18:10:48 - [info] Loading palette nodes
26 May 18:10:50 - [info] node-red-contrib-telegrambot version: v17.4.12
26 May 18:10:55 - [info] Dashboard version 3.6.6 started at /ui
26 May 18:10:55 - [info] Settings file : /home/nodered/.node-red/settings.js
26 May 18:10:55 - [info] Context store : 'default' [module=memory]
26 May 18:10:55 - [info] Context store : 'storeInFile' [module=localfilesystem]
(node:5811) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
26 May 18:10:55 - [info] User directory : /home/nodered/.node-red
26 May 18:10:55 - [info] Projects directory: /home/nodered/.node-red/projects
26 May 18:10:56 - [info] Server now running at ``http://127.0.0.1:1880/
26 May 18:10:56 - [info] Active project : 2401091_Tennyson
26 May 18:10:56 - [info] Flows file : /home/nodered/.node-red/projects/2401091_Tennyson/flows.json
26 May 18:10:56 - [warn] Using unencrypted credentials
26 May 18:10:56 - [info] +-----------------------------------------------------
26 May 18:10:56 - [info] | 🌐 uibuilder v7.6.2 initialised
26 May 18:10:56 - [info] | root folder: /home/nodered/.node-red/projects/2401091_Tennyson/uibuilder
26 May 18:10:56 - [info] | Using Node-RED's webserver at:
26 May 18:10:56 - [info] | ``http://0.0.0.0:1880/
26 May 18:10:56 - [info] | Installed packages:
26 May 18:10:56 - [info] | justgage, uplot
26 May 18:10:56 - [info] +-----------------------------------------------------
26 May 18:10:56 - [info] Starting flows
26 May 18:10:56 - [warn] Unknown context store 'memoryOnly' specified. Using default store.
26 May 18:10:57 - [warn] Unknown context store 'memory' specified. Using default store.
26 May 18:10:58 - [info] Started flows
26 May 18:10:59 - [info] [serialconfig:eaecafaec2da87bc] serial port /dev/ttyUSB0 opened at 115200 baud 8N1
26 May 18:10:59 - [info] [mqtt-broker:Node-RED MQTT] Connected to broker: mqtt://172.27.123.58:1883
26 May 18:11:03 - [info] [position-config:8aac1f268a7d4357] getSunCalc, time difference since last output to low, do no calculation

I have been reviewing SWP Memory on another project so I find this rather perplexing. Really? No SWP Memory used?? (Must admit, I can't recall what it might have been before getting this error.

Note the 100% use. It has built up to this

I have InfluxDB and Grafana on this RPi5. I have already stopped Grafana as it is only needed for display. There are calls to load and fetch to/from InfluxDB, so I have left it running. System OS is up to date.

Any ideas anyone please?

TIA,
Colin J

The usual cause is an infinite loop, either explicit in your code or involving an external process such as subscribing and publishing to the same MQTT topic.

It's also possible for an overly expensive regular expression to max out CPU usage.

Looking at this log, Node-RED crashed because a node attempted to run JSON.stringify() on a massive object (or an object with a circular reference that caused an infinite loop, though a massive payload is more likely here). The heap memory allocation failed right around the default 1 GB limit (you can see 1005.1 -> 989.6 MB in the GC logs), which is typical for Node.js on a 32-bit system or a constrained environment like a Raspberry Pi.

Are you selecting all from DB then passing it to a node that stringifies (like a debug node?)

An infinite loop is also a common reason (and just so happens the last thing to push it over the edge was a stringify operation)

I would get rid of the max_old_space paramter, it shouldn't be needed. That isn't your main issue though.

Hmm, good point, I should add a check for that to my saferSerialize() function and the various tools that use it.

Thank you Peoples!

I 'suspected' the JSON.stringify() command might be part of the problem, Not changed anything. This was all working OK before I had a problem with my main WiFi Router.

I can't recall how long this installation of NR has been running for. I am wondering if perhaps some kind of corruption has taken place. I have just modified some of the MQTT node calls to make sure I only have one MQTT In and one Out.

I now have the relevant information and now need to eliminate the flow causing the problem.

Many thanks!
Colin J

Oh, the max_space parameter was done in the original install all those years ago. Probably to cover the transfer from the RPi3 to RPi5.

Yes, really shouldn't be needed now I don't believe and will be restricting your memory allocation on larger memory devices.

I know this doesn't help you now, but the new function node uibuilder functions should help in uibuilder's next release. These can serialize pretty much anything that JavaScript/Node.js can create without crashing. They give sensible results where they can and a hint where they can't (functions, circular references, etc).

While carrying out my Disable/Enable each flow, which has proved inconclusive at the moment, I came up with a question.

Is there a procedure to completely remove and reinstall Javascript and all of the NPM Modules in case there is a corruption in any of them? Can it be forced using the NR install script?

It is most unlikely that a corruption would lead to such a symptom, but if you want to do that you can. Assuming that you used the recommended pi install script then

  1. Uninstall node-red
    sudo npm remove node-red
  2. Uninstall nodejs
    sudo apt remove nodejs
  3. Rerun the install script to re-install them. Specify the version of nodejs you want if necessary.
  4. Go into your .node-red folder and delete the node_modules folder to remove all additional installed nodes
  5. Re-install all additional nodes by running (from within the .node-red folder)
    npm install

But really, your issue is not caused by such a corruption.

Does the memory usage increase slowly or suddenly? Run top and watch the memory and CPU used by node-red.

Thank you for that. This question is the Nuclear Option, I am not planning to do it at the moment.

I have been running htop and now node-red-log since last night when I posted the original log. I am watching memory usage, but after an initial run through disabling each flow in turn and not seeing anything awry, I have now disabled a block of some 'ancillary' flows used for doing more specific non-essential tasks and I am just leaving it running. Hopefully, this might eliminate some flows.

As for memory usage increasing, the Jury is out... Time will tell!!

Oh, the other things I have done is to disable Bluetooth and Wireless as they are not needed.

Presumably you are not seeing high CPU usage normally.

Not that I can recall.

Currently, when I stop NR, I het ~455M Memory Usage, when I start NR, this goes up to 830-920M i.e. about 400M more. Of course, other processes will be using memory, so not sure how much is used with JS 'idle'.

The best indicator I have is the response time to a Zigbee Switch that controls a light. When all is OK, the light responds as expected. When the problem starts, the response time slows. Looking at Zigbee2MQTT, the response to the switch is immediate, but, the signal does take a while to get through the MQTT In, Switch, Switch Nodes. I now only have a single MQTT Input and Output Nodes, Telegram is also disabled.

I was asking about CPU usage, not memory. But for memory don't worry about about the overall memory look at the specific node red usage.

What MQTT broker are you using? Is it on the same computer as Node-red or elsewhere?

Looking at htop on my "live" server after many days of running:

The values you give depend on what you are measuring. For example, the startup heap size for my Node-RED instance is around 2,100 MB.

Memory use on my dev instance (on Windows) is around 500MB.

Cooking Dinner (for tomorrow). Sorry for delay.

NR/JS hasn't rebooted yet. Memory usage is reasonably constant. I have one Processor continuously at 100% (Core changes). NR seems to be using ~10.5% of memory.

@Colin Sorry, meant to include htop

@jbudd Mosquitto on the same server, Z2M is on a different server. Message coming through Z2M instantly - Message coming available on Mosquitto just as quick!

@TotallyInformation Yes, it would be nice to have historic screenshots on this server!

OK, then I suspect something is quite seriously wrong there. On my live server, I do get regular peaks at around 40% of a core but only briefly. They seem to coincide with peaks on InfluxDB as well.

Also, I can see that you seem to have many node-red processes running - possibly sub-processes as they all seem to have the same virt and res memory. Do you have something spawning a shell command that isn't correctly exiting?

If you want that, install Telegraf. It can easily record system performance metrics to InfluxDB which you can then monitor with Grafana. It can also write the metrics to MQTT if you like - I do. :smiley: Telegraf is very low overhead. InfluxDB is OK as long as you take care not to let your DB's get too large. I keep minute-level details for a week and auto-aggregate to hourly for long-term monitoring.

Do you have something spawning a shell command that isn't correctly exiting?

No, not as far as I am aware - all EXEC Nodes are disabled.

coincide with peaks on InfluxDB

All calls to InfluxDB disabled except one, , but that isn't called until 15:00 hours to check I have a value in a timed location (01:00 tomorrow). It is filled when Electricity Prices become available each afternoon. Ah, Telegraf, there you go again making me learn something I used a while ago and now steer clear of.

OK, then I suspect something is quite seriously wrong there.

Hence my question regarding the Nuclear Option...

Disable flows to find what is consuming the cpu.