Docker is running on a relatively low-end Linux system with dual-core processor of type Intel Celeron N3350 and 4GB of memory. About 15 microservice containers run together, communicating with each other and to the outside world.
When I add a fresh nodered/node-red:latest container with no custom options or properties, it causes almost-regular CPU spikes up to 95% of the core it uses, +/- every 30 minutes. I repeat that this container is fresh/empty: there are no plugin nodes installed, and there are no active flows at all. Given the fact that the system I'm using is not very high-end, it sometimes crashes if the other processes are also under some CPU stress.
I am sure that this is a Node-RED issue, because I had already set up monitoring for CPU usage per container on the system. Below is a graph of Node-RED's CPU usage (I have filtered out the other containers for clarity).
Several other Node.js containers run on this system, without similar effects. I briefly set up memory monitoring, but it remains at +/- 2% (80MB) at all times, so I think that is unlikely to have an influence.
I have already looked at other threads on this forum, I have tried adding the NODE_OPTIONS environment variable to the container with --max-old-space-size=256, to no avail. I have tried running strace, but I can't install it on the alpine-linux container it is running in. I have also tried installing Node-RED on another system that is similar but more high-powered, and the pattern persists (minus the crashes).
When I run the container on my local Windows system, it shows similar spikes but they happen more often and are way less pronounced (I have a more high-end quad-core system):
@Colin there may be an underlying hardware problem for this particular system, but the software problem that causes it is clearly the as of yet undetermined random CPU events in Node-RED. I have no problem with a container restarting now and then, but the potential of some or all containers on a moderately stressed system rebooting every 30 minutes is a dealbreaker for me if I want to use Node-RED. But as I said, I'll monitor more closely and hope to come back with better diagnostics regarding crashes.
@janvda I'm relieved that other people are seeing similar stuff. I am using the current nodered/node-red:latest image on Linux AMD64, which specifies that it uses version 2.1.3. Digest sha256:c6c82e6404c88f766e18034148597e59ff593e291622b965ac9c4c7342bb9469 and ID d92760a79499
Monitoring the CPU per container happens in a different container, written in TypeScript, which basically polls docker stats on the host system every 5 seconds and forwards that data to a Prometheus instance.
Some of the other containers on the same system (among which the monitoring container) also run on Node.js, and while the most heavily stressed one does have some CPU spike events, they are much less intense (up to 15% CPU, on the weaker system) and much less frequent (every 4-6 hours).
Forgot to mention: when I did the memory monitoring, I figured that either the garbage collection is either not the cause, or Node.js doing a very bad job, because memory usage would go down by at most 1% of the container's used memory, so less than 1MB per (assumed) GC cycle, if it went down at all. The below graph is the best I've got and it's not granular enough to show a drop - which by itself shows that very little happens on the memory front.
Just checked it and it is exactly same version as I am using.
I think my charts show exactly the same as your charts. Your spikes are higher as your are monitoring with a higher frequency. I am using telegraph to monitor it with an interval of 1 minute.
... but I don't have those crashes so I really doubt that those spikes are responsible for that.
On average those spikes consume only 1% of CPU so I didn't bother much about it, but I agree that it is at least interesting to understand what is causing it.
You can also consider running docker without this node-red service and see if you still get those crashes ?
Did you also check if there is not a memory issue (lack of RAM) or disk issue (lack of disk space) ?
to the top of the settings.js file - and then adding the --expose-gc flag to the node command line.
Note: it is def not recommended to run this way normally as it will impact performance, as it calls the GC every 30 secs - but it would in this instance help show if that is what is happening every 20 mins or so... obviously feel free to change the timings yourself etc.
This system and another one had both been running for weeks/months, and they both showed crashes/restarts within hours of adding the node-red container to the docker-compose file (and docker-compose down/up). It is definitely not a disk space issue, but memory is possible - it is not monitored as diligently, I'll have to ask my co-workers why that is and/or fix it permanently.
Thanks for your suggestions, I'll add the setInterval and report back with the results. Performance is not an issue on this system, because it is specifically used for exploratory research and testing.
To be clear. What crashed exactly ?
Did only the node-red container got restarted ?
I would definitely check all the logs.
If not only the node-red container got restarted then I would certainly look at other root causes. Docker should normally isolate the container from the rest so that any problems with the container won't impact the rest. Is the node-red container running in privileged mode ?
Besides this (in 2nd place) you can also consider:
limiting the max CPU utilization of your node-red container with docker option
test with an older node-red image from docker hub and see if it has same problem/same CPU spikes.
As promised, I'm coming back with better diagnostics and more information.
First of all, as you suggested, the crashes are not strictly related to Node-RED. I saw the same scenario as I did before, only without Node-RED running, and it turned out that an internal software package, which we use for testing, has a memory leak. Ouch.
Second, I have done the following, upon your suggestions:
I changed the version of the node-red container to 1.3.7. This had no effect on the CPU spikes.
I tried to induce CPU stress (I flooded the most CPU-intensive container with requests for heavy calculations), which had the effect that Node-RED's CPU spikes became more frequent, up to 6 times per hour. As before, the actually used memory seems to be unaffected. The light blue peaks are Node-RED.
I let settings.js run garbage collection every 60 seconds, which had no effect on the frequency or peak intensity of the CPU spikes. They still occur every 20-30 minutes (if anything, it looks like they are more frequent ?)
Here is a simple graph of the stats printed out by global.gc(). If they're relevant to your thought process, I can send them, but I think the printed garbage collection stats are also not very informative.
That is good news. You have found the root cause of the crashes and it is not node-red.
So I guess that those node-red CPU spikes are a minor issue now.
Also all my node-red docker containers show those spikes (with different interval period), but I also noticed that other containers have those spikes:
here below, at the left you see a samba container and at the right you see a node-red container.
Do I understand correctly that this issue has been determined to be relatively low priority and/or harmless ? If you do want to continue looking into it and if further assistance is required, then I'm happy to assist and provide more details if I can supply them.
I don't think anyone has said anything to that effect. It does need someone to investigate it - but as with all issues, it depends on who is able to spend time on any particular item, and where it sits in the big scheme of things.
Given a few people are able to reproduce it, the best thing is to raise an issue - otherwise this forum thread will get lost and forgotten about.
The only periodic timer I can think of inside Node-RED is the context file store that flushes to disk after changes are made. But if you haven't enabled persistent context in your settings file (and I seem to recall you saying this is a default configuration) then that would be ruled out.