Persistent context crashes NR

I am a relatively new NR user. I have a production system running on Windows VM as well as a pilot system running on linux. Issue replicated on both systems.

I am using NR in a manufacturing environment and most nodes are interfacing with equipment/sensors, displaying series charts as well as status fields on dashboards and also logging that data to a MSSQL server in some, but not all, cases. I currently have roughly 15 flows running doing various tasks.

Generally I poll my devices every 2-10 minutes and present some of that data on time series line or scatter plots. I ran into troubles with losing my time series chart data (using a mix of Dash and Dash 2.0 chart) when I would have to restart NR. This is a problem as most charts are 24 hours and up to 2 weeks in duration. I decided to try persistent context by enabling disk storage in settings.js and storing an array of objects for approx 8 of the 15 flows, each flow having their own array.

Initially the system works fine but both my windows system as well as linux crash after a period if time. Once they crash they will not restart unless I get rid of the persistent context.

On the linux system the global.json file was fairly small:
-rw-r--r-- 1 nradmin nradmin 74274 Apr 17 08:58 global.json

and this is the log output before crash:
17 Apr 09:11:15 - [info]

Welcome to Node-RED

17 Apr 09:11:15 - [info] Node-RED version: v3.1.9
17 Apr 09:11:15 - [info] Node.js version: v18.20.2
17 Apr 09:11:15 - [info] Linux 5.15.0-102-generic x64 LE
17 Apr 09:11:15 - [info] Loading palette nodes
17 Apr 09:11:16 - [info] Dashboard version 1.0.2 started at /ui
17 Apr 09:11:16 - [info] Settings file : /home/nradmin/.node-red/settings.js
17 Apr 09:11:16 - [info] Context store : 'default' [module=localfilesystem]
17 Apr 09:11:16 - [info] User directory : /home/nradmin/.node-red
17 Apr 09:11:16 - [warn] Projects disabled : editorTheme.projects.enabled=false
17 Apr 09:11:16 - [info] Flows file : /home/nradmin/.node-red/flows.json
17 Apr 09:11:16 - [info] Server now running at http://127.0.0.1:1880/
17 Apr 09:11:16 - [info] Starting flows
17 Apr 09:11:16 - [info] [ui-base:Process Data] Node-RED Dashboard 2.0 (v1.7.1) started at /dashboard
17 Apr 09:11:16 - [info] [ui-base:Process Data] Created socket.io server bound to Node-RED port at path /dashboard/socket.io
17 Apr 09:11:16 - [warn] [modbus-client:Pit 1] Client -> fsm init state after new TCP@192.168.140.14:502 default Unit-Id: 1
17 Apr 09:11:16 - [warn] [modbus-client:Pit 1] Client -> first fsm init in 500 ms TCP@192.168.140.14:502 default Unit-Id: 1
17 Apr 09:11:16 - [info] Started flows
17 Apr 09:11:17 - [warn] [modbus-flex-sequencer:788d99cf52d5b1bc] Flex-Sequencer -> Inject while node is not ready for input.

<--- Last few GCs --->

[1382:0x6caa850] 26226 ms: Scavenge 2039.6 (2081.5) -> 2036.6 (2081.5) MB, 3.8 / 0.0 ms (average mu = 0.218, current mu = 0.193) allocation failure;
[1382:0x6caa850] 26234 ms: Scavenge 2040.1 (2081.5) -> 2037.4 (2082.5) MB, 4.0 / 0.0 ms (average mu = 0.218, current mu = 0.193) allocation failure;
[1382:0x6caa850] 26241 ms: Scavenge 2040.9 (2082.5) -> 2038.1 (2083.5) MB, 3.9 / 0.0 ms (average mu = 0.218, current mu = 0.193) allocation failure;

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 0xb9a330 node::Abort() [node-red]
2: 0xaa07ee [node-red]
3: 0xd71ed0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node-red]
4: 0xd72277 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node-red]
5: 0xf4f635 [node-red]
6: 0xf50538 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node-red]
7: 0xf60a33 [node-red]
8: 0xf618a8 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node-red]
9: 0xf64a75 v8::internal::Heap::HandleGCRequest() [node-red]
10: 0xee2bdf v8::internal::StackGuard::HandleInterrupts() [node-red]
11: 0x12e349f v8::internal::Runtime_StackGuardWithGap(int, unsigned long*, v8::internal::Isolate*) [node-red]
12: 0x1710739 [node-red]
Aborted (core dumped)

My question is am I using the persistent context improperly? Is there a size limitation? Rate limitation? I could possibly use my sql system for this but in some cases I do not actually need to log the data to sql, just need the visual history.

Thanks for help in advance.

I suspect that 1 or more of your persisted variables has grown rather too large for the VM (or possibly related to how you've configured Node.js to run). Bearing in mind that each context variable has to be read fully into memory.

Couple of things to check. 1st, have you restricted Node.js's stack space by specfying a --max-old-space parameter? If so, try running without it.

2nd, how much memory have you given to the VM's?

3rd - can you estimate the size of your variables?

Since you have a database, it seems to make sense to use it for this chart data, even if you only retain it for a short time.

Bit of novice so forgive ignorance:

I'm debugging on my pilot Linux system as I don't want to mess up production for now.

I looked in the NR settings.js and do not see that line active. might have been commented out and I missed it. Is it somewhere else? I did the default install for a linux system (my IT dept did the windows VM one so I'm leaving it alone for now).

The linux box is a Intel i5 with 32 GB ram running linux mint. Nothing else on it.

Not sure how to estimate size. I watched the size of the global.json file and it roughly grew 1.5kB/min. size when it crashed was 74 kb

Re the use of the database, I was thinking keeping it local was more efficient network traffic wise as well as less load on the sql server but its definitely doable if the context avenue doesn't work. Just trying to be as efficient and elegant as possible.

--max-old-space-size is not a node-red thing, it is a node command line parameter - Command-line API | Node.js v21.7.3 Documentation.

On your Linux machine, you would have to check the systemd startup script for Node-RED. You'll have to forgive me for not pointing you to it directly - that's because I always have to look up the locations myself (on my own server I linked the script to a more convenient location so I don't have to try and remember the arcane locations that systemd uses :grinning: ).

Not a problem, that's what forums are for!

Yes, very sensible.

Well, that feels like it should be big enough!

OK, so maybe my guess wasn't correct then because that is tiny. On my home automation server for example, my global.json file is 2.5MB! I also have a 123k flow context file as well as plenty of others ranging from 2 bytes to 32k.

And that is on a laptop running as a Debian Linux headless server with just 8GB of RAM and running many other services as well.

So next thing to look at - given the timeline of your output, it seems like NR is crashing shortly after starting the flows. So now you need to trace through you flows looking for things that run shortly after startup. That could be an inject node set to run at or just after start or it could be what you have in the 1st tab in the Editor (which runs first).

Go through those and look for something that might be doing something heavy. If you can turn them off one-by-one or turn everything you can off and turn them back on one-by-one.

You can also turn up the node-red logging. By default it is set to info and audit is off. You can set it to trace and turn on audit. This will give you a much better view of exactly where it is crashing.

Thanks. Unfortunately I have to get the system running ASAP so after spending a good part of the day diagnosing I went down the path of using my SQL server as a buffer and its working.

I did find out how to update the node memory and made a nice little memory monitor for node-red as well as a failsafe trigger that runs garbage collection when process memory exceeds a certain value. Would welcome critique/thoughts on it.
flows(2).json (8.4 KB)

Overall I think my lack of experience with the platform is leading to a memory leak somewhere. It seems like using the log to disk made it worse but it did still happen when I kept in memory, just not as fast.

context can be very large, I use both memory only and file based, ranging from 20MB to 300MB context array objects, never had any issue with it. The large ones live only in memory (array with 250k+ json objects) and filter + sort takes mere milliseconds, truly impressive.

I think OP's memory issue is somewhere else.

Thanks. Is there a thread or reference area I can refer to that may help educate me on pitfalls associated with memory issues?