Node-RED on docker causes regular CPU spikes

Docker is running on a relatively low-end Linux system with dual-core processor of type Intel Celeron N3350 and 4GB of memory. About 15 microservice containers run together, communicating with each other and to the outside world.

When I add a fresh nodered/node-red:latest container with no custom options or properties, it causes almost-regular CPU spikes up to 95% of the core it uses, +/- every 30 minutes. I repeat that this container is fresh/empty: there are no plugin nodes installed, and there are no active flows at all. Given the fact that the system I'm using is not very high-end, it sometimes crashes if the other processes are also under some CPU stress.

I am sure that this is a Node-RED issue, because I had already set up monitoring for CPU usage per container on the system. Below is a graph of Node-RED's CPU usage (I have filtered out the other containers for clarity).

Several other Node.js containers run on this system, without similar effects. I briefly set up memory monitoring, but it remains at +/- 2% (80MB) at all times, so I think that is unlikely to have an influence.

I have already looked at other threads on this forum, I have tried adding the NODE_OPTIONS environment variable to the container with --max-old-space-size=256, to no avail. I have tried running strace, but I can't install it on the alpine-linux container it is running in. I have also tried installing Node-RED on another system that is similar but more high-powered, and the pattern persists (minus the crashes).

When I run the container on my local Windows system, it shows similar spikes but they happen more often and are way less pronounced (I have a more high-end quad-core system):

If it crashes then there is something wrong. What do you see in the logs when it crashes?

Thanks for your answer. To be clear, I meant that the entire docker process crashes (or even, in one instance, the system, if I remember correctly), not just the Node-RED container.

I currently have no logs of this event anymore (the system cycles through them quite quickly), I'll see about reproducing the circumstances that cause a crash.

There is something horribly wrong in that case. If it crashed the whole system once then I would suspect hardware. PSU perhaps.

Here the output of my docker node-red service (nodered/node-red:2.1.3-14) (which I think is more or less idle) running on a raspberry pi 3:

image

Here another example (running on intel nuc 4x Intel(R) Celeron(R) CPU J3455 @ 1.50GHz):

So, I also see the spikes in the CPU but didn't gave it much attention as on average the CPU consumption is still low.

Can you specify the actual docker hub image you are using ?

Can you specify how you are monitoring the CPU ?

Maybe it is the node.js garbage collector that is causing these periodic CPU spikes.

@Colin there may be an underlying hardware problem for this particular system, but the software problem that causes it is clearly the as of yet undetermined random CPU events in Node-RED. I have no problem with a container restarting now and then, but the potential of some or all containers on a moderately stressed system rebooting every 30 minutes is a dealbreaker for me if I want to use Node-RED. But as I said, I'll monitor more closely and hope to come back with better diagnostics regarding crashes.

@janvda I'm relieved that other people are seeing similar stuff. I am using the current nodered/node-red:latest image on Linux AMD64, which specifies that it uses version 2.1.3. Digest sha256:c6c82e6404c88f766e18034148597e59ff593e291622b965ac9c4c7342bb9469 and ID d92760a79499

Monitoring the CPU per container happens in a different container, written in TypeScript, which basically polls docker stats on the host system every 5 seconds and forwards that data to a Prometheus instance.

Some of the other containers on the same system (among which the monitoring container) also run on Node.js, and while the most heavily stressed one does have some CPU spike events, they are much less intense (up to 15% CPU, on the weaker system) and much less frequent (every 4-6 hours).

Forgot to mention: when I did the memory monitoring, I figured that either the garbage collection is either not the cause, or Node.js doing a very bad job, because memory usage would go down by at most 1% of the container's used memory, so less than 1MB per (assumed) GC cycle, if it went down at all. The below graph is the best I've got and it's not granular enough to show a drop - which by itself shows that very little happens on the memory front.

Just checked it and it is exactly same version as I am using.

I think my charts show exactly the same as your charts. Your spikes are higher as your are monitoring with a higher frequency. I am using telegraph to monitor it with an interval of 1 minute.

... but I don't have those crashes so I really doubt that those spikes are responsible for that.

On average those spikes consume only 1% of CPU so I didn't bother much about it, but I agree that it is at least interesting to understand what is causing it.

You can also consider running docker without this node-red service and see if you still get those crashes ?
Did you also check if there is not a memory issue (lack of RAM) or disk issue (lack of disk space) ?

One way to test the GC is to call it manually... - you can do this by adding something like

setInterval(function() {
    try {
        if (global.gc) {
            console.log("Calling GC:",process.memoryUsage());
            global.gc();
        }
    }
    catch (e) {
    }
},30000);

to the top of the settings.js file - and then adding the --expose-gc flag to the node command line.

Note: it is def not recommended to run this way normally as it will impact performance, as it calls the GC every 30 secs - but it would in this instance help show if that is what is happening every 20 mins or so... obviously feel free to change the timings yourself etc.

This system and another one had both been running for weeks/months, and they both showed crashes/restarts within hours of adding the node-red container to the docker-compose file (and docker-compose down/up). It is definitely not a disk space issue, but memory is possible - it is not monitored as diligently, I'll have to ask my co-workers why that is and/or fix it permanently.

Thanks for your suggestions, I'll add the setInterval and report back with the results. Performance is not an issue on this system, because it is specifically used for exploratory research and testing.

To be clear. What crashed exactly ?
Did only the node-red container got restarted ?
I would definitely check all the logs.

If not only the node-red container got restarted then I would certainly look at other root causes. Docker should normally isolate the container from the rest so that any problems with the container won't impact the rest. Is the node-red container running in privileged mode ?

Besides this (in 2nd place) you can also consider:

  • limiting the max CPU utilization of your node-red container with docker option
  • test with an older node-red image from docker hub and see if it has same problem/same CPU spikes.

As promised, I'm coming back with better diagnostics and more information.

First of all, as you suggested, the crashes are not strictly related to Node-RED. I saw the same scenario as I did before, only without Node-RED running, and it turned out that an internal software package, which we use for testing, has a memory leak. Ouch.

Second, I have done the following, upon your suggestions:

  1. I changed the version of the node-red container to 1.3.7. This had no effect on the CPU spikes.
  2. I tried to induce CPU stress (I flooded the most CPU-intensive container with requests for heavy calculations), which had the effect that Node-RED's CPU spikes became more frequent, up to 6 times per hour. As before, the actually used memory seems to be unaffected. The light blue peaks are Node-RED.

  1. I let settings.js run garbage collection every 60 seconds, which had no effect on the frequency or peak intensity of the CPU spikes. They still occur every 20-30 minutes (if anything, it looks like they are more frequent ?)

Here is a simple graph of the stats printed out by global.gc(). If they're relevant to your thought process, I can send them, but I think the printed garbage collection stats are also not very informative.

1 Like

I'm not running on Docker but my live node-red does show regular spikes. Here is a 15 minute chart of the node-red service:

The peak is 2% CPU. This is on an Intel i5 with 8GB RAM.

That is good news. You have found the root cause of the crashes and it is not node-red.

So I guess that those node-red CPU spikes are a minor issue now.

Also all my node-red docker containers show those spikes (with different interval period), but I also noticed that other containers have those spikes:
here below, at the left you see a samba container and at the right you see a node-red container.

So I think docker is responsible for these spikes.

I use a Raspberry 4 / 4GB and have the same spikes since ages:

Do I understand correctly that this issue has been determined to be relatively low priority and/or harmless ? If you do want to continue looking into it and if further assistance is required, then I'm happy to assist and provide more details if I can supply them.

I don't think anyone has said anything to that effect. It does need someone to investigate it - but as with all issues, it depends on who is able to spend time on any particular item, and where it sits in the big scheme of things.

Given a few people are able to reproduce it, the best thing is to raise an issue - otherwise this forum thread will get lost and forgotten about.

Please raise an issue here GitHub - node-red/node-red-docker: Repository for all things Node-RED and Docker related - provide the details of the issue, but also include a link back to this thread for wider context.

The only periodic timer I can think of inside Node-RED is the context file store that flushes to disk after changes are made. But if you haven't enabled persistent context in your settings file (and I seem to recall you saying this is a default configuration) then that would be ruled out.

2 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.