Memory leak or what?....Restarting flow with tf coco ssd node increases memory consumption

Having NR 2.0.6 & Buster full version running on a RPi4 w 4G. A flow that is pretty complex but runs very stable.

A fresh start of NR with the flow occupies roughly 15% of the available memory according to top and ps aux. I'm using Projects and when swithing to a previous version of the project, occupation of the memory increases, in my case with around 5% on the RPi4. Switching back increases occupation with another 5%, totalling in 10% more. Further switching shows the same increases for each switching made

This memory occupation is "confirmed stable". After running the system several hours after switching, even over night, there is no change, I expected the garbage collection to kick in but it seems the memory is not released correctly

Any ideas how I can debug this? I suspect something in the flow causing this since the same test with other flows does not behave like this. Also to mention, the same behaviour with the flow in a RPi3 shows the same behavior

EDIT: Is actually enough to just restart the flow, each restart increases the memory consumption in the same way

In the flow we are using, the tf coco ssd node is placed in several subflows. This node is causing the memory occupation. Trying with a simple flow with just that node added shows the same behavior

From the Deploy button, select to restart the flow several times and check the memory occupied by node using top and you will see that it increases after each restart and it is never released - only way to release the memory is to completely restart NR or reboot

image

What values in top are you using to determine that the total memory in use is going up?

If you keep doing it does node-red eventually crash?

I think that you will find that GC doesn't kick in unless it needs to so depending on whether you started Node-RED with a parameter to guide the GC process (--max-old-space-size?) you may find that Node.js doesn't think it needs to do anything. Maybe try temporarily setting the old space size to something small like 128 to see if GC starts kicking in.

Also, check what nodes you have installed. Nodes are "live" for all projects not just the current one. That means that the core code of every node is always executed on startup. It is possible that you have a misbehaving node installed.

If I get time, I'll try to see if this is happening on my dev system which is the only place I use projects.

I can see that that it continues to increase after each flow restart if I just have that coco node on the flow. If the flow is empty, no nodes added at all, there is no increase at all. On my RPi4 I did not stress it that far but on my RPi3 I switched projects until it was completel exhausted and did not respond anymore - I waited very long but no GC seemed to happen, I simply had to recycle the power supply

If I add no nodes at all to the flow, there is no increase at all after flow restarts. Add several tf coco ssd nodes to the flow and the % increase will be larger after each restart

That's your answer then, almost certainly.

What resources does that node create/allocate for each instance? You can check whether those are cleared out in the on close function.

It's reading a model for AI object detection, I have no clue what else it is doing or if it is releasing stuff correctly, I have just discovered this behavior yesterday. The author is luckily well known, @dceejay

If I check with ps, the result is as below. I have 5 copies of the node in the flow, nothing else. Each time I restart the flow, the memory increases. If I delete all nodes, having then an empty flow, and deploy, the memory is not released


It does look as if the node is not releases its resources on close. Maybe add an issue on the node's github page.

2 Likes

OK - I have pushed 0.5.7 that explicitly clears the model (and font) on close... but the way node.js works the GC still may not kick in until it feels like it...

Unfortunately, it did not help

I put 12 copies of the node on a flow (nothing else). After each flow restart, memory consumption increases with around 5% on a RPi4 w 4G. When memory consumption increases it takes longer and longer for the models to be loaded into all 12 nodes. When it reaches around 69%, it is not able to load all nodes with the models. After another flow restart, NR is no longer able to load the model into any of the 12 nodes. Node is now occupying 72% of the memory

On a RPi3 it has difficulties to load the models in all 12 nodes already from start, it takes long time, I think memory is not enough so continuing the test on a RPi3 is meaningless


Made a comparison test, just to have something to compare, using another node for ssd object detection

When I do the same tests, repeatedly restarting the flow, the memory consumption does not build up. So it must be something else than loading/offloading the model

I would like to keep using the tfjs-coco-ssd node since it is also much faster in analyzing