Example of Node Red Going Wacko! (Or Node js?)

A PC is a totally different architecture than a Pi. A vastly more efficient one end-to-end, especially when it comes to major I/O such as disk read/write, USB, etc. This is all well understood. The Pi gives great value but it is an educational tool by design. Low cost.

So it isn't surprising that it isn't showing the same problems. The Pi writes to ultra-slow SD-Card via the USB2 bus. Your PC likely uses SSD over the PCI bus.

My guess is that the multi-core high % CPU you are seeing is probably a result of high I/O to/from the SD-Card.

Yes, and yes. That would also suggest how the files could have not been saved as expected.

Are you suggesting that I did not break the flow when I was trying to break the flow? LOL... that is what I get for monkeying about with the flow.

Odd, the version I have running is running endless. I just enabled the flow, and it is generating node.warnings over and over.

The 10 counts ends but the inject should be firing again and again. Did I past the wrong flow? Here is it again.

[{"id":"7251a30c.f8eb3c","type":"inject","z":"2e42ee86.478292","name":"Inject","props":[{"p":"payload"}],"repeat":"1","crontab":"","once":true,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":150,"y":80,"wires":[["b7f0c833.0d5738"]]},{"id":"b7f0c833.0d5738","type":"function","z":"2e42ee86.478292","name":"Endless","func":"\nfor (var theIndex = 0; theIndex < 10; theIndex++) {\n    \n    var theTime = new Date().getTime();\n    \n    //\n    node.warn(theIndex);\n    node.warn(theTime);\n}\n\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","x":310,"y":80,"wires":[["640f8aec.dd5634"]]},{"id":"640f8aec.dd5634","type":"debug","z":"2e42ee86.478292","name":"Message","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","statusVal":"","statusType":"auto","x":480,"y":80,"wires":[]}]

This PC VM is really interesting, it scales really well, unless I explicitly cripple it... Got to love VMware.

It does keep running, but it goes nowhere near 100% on a pi 4, and so causes no problems with deploying or stopping node red. There is a noticeable gap between each burst of 10 runs through.

[Edit] I keep thinking of new things to ask:
When you stop node red (presumably using sudo systemctl stop nodered or node-red-stop does it stop within a few seconds, or does it take a minute or so?
Have you tried running node-red-log when it is failing to see if there is anything there, I am wondering whether node-red is repeatedly crashing and restarting.

I put the loop count up to 1000 each second instead of 10 and it is still only loading the pi at about 20%. Though the browser on my PC didn't like it much with 1000 debug messages and 2000 warn messages arriving every second, but that is not surprising.

It would take a while, more than a few seconds to stop. That is a great question, I had not thought of mentioning that... usually a stop command is fast, a second or so at most, on a Pi3 or Pi4. I did not see anything in the logging at all that suggested any type of fault or error.

I still have yet to get back to the Pi3 that started this scenario, a lot of life events popping up at the moment pulling me from my planned testing. But hope to try again this weekend coming.

If you have the node-red log going when you stop node-red do you see messages saying the it has been killed rather than stopped. systemctl stop will try to stop node red nicely, but if it doesn't shutdown after a timeout then it kills it.

I thought the example you posted was generating the lockup on a pi 4 but seems odd since on mine it imposes a trivial loading.

The very first 'file' disappear issue was on a Pi4. The recent issue/event was Pi3. I think after all the testing and review, this breaks down to one reasonable (possible) explanation, I figured out how abuse my Pi3 to such an extent that the file save sequence invoked by the deployment mechanism, failed, but not because of the NR code design, but because something below the NR code, failed. Possible explanation explains why I saw nothing of note in the NR logs. The only thing I did not watch for, and in hindsight should have, was on next reboot of the said Pi3, if a file system validation was done or otherwise triggered. If such happened that would suggestively hint that it was below NR that the real issue occurred. If this was in the direct NR scope, I have to believe it would be more frequent, given the user scale, frequency of Pi ssd cards doing odd things, etc.

It would still be good to have a vaguely repeatable case, so that it could be tested with the new Save mechanism to see if that fixes it. Similar occurrences have been reported a couple of times before I think.

Oh, I have not stopped working the issue, I am going to design a true hammer flow that really can push NR. I developed such a tool to stress virtual hypervisors years ago. It would bring VMs to their knees, and even the hypervisor its-self if needed. We used it to validate low memory and IO scenarios in the lab before we qualified production designs.

So, I think I can come up with something that will do it. What would you say are the best attack vendors, I am thinking low memory and high file system IO. Network IO loading I don't see as important. But let me know if you think otherwise.

I even wrote a tool that would stress test disk fragmentation tools (when such as important), but I digress.

The other option is find tools designed to stress node.js its-self rather than directly NR.

I have been trying to do that myself and not made it misbehave, so there may be something specific about your flows. I can easily lockup node-red of course, but have not made the files disappear. I think the fact that you have been showing 250% utilisation may well be significant. If you put node red in a simple loop while (true) {} for example, then it only consumes 100% as it is single threaded (so cannot use more than one core at a time). 100% means 100% of one core. The fact that you are seeing more than 200 means that it is also running out-of-nodejs stuff (I think). I am not sure if this would be the case if you were doing lots of file I/O in node-red, for example, at the same time as running the loop. Perhaps if there were lots of card writes going on then these get buffered up and the card can't keep up so the Deploy writes get buffered in memory but don't get fully written out to the card for some reason when node-red is stopped.

I presume you were not ever doing a hard power off, that could certainly cause data loss.

I would start with the system that you know fails rather than starting from scratch trying to guess what the trigger might be. Maybe work out what is adding to the processor usage to take it up to 250%.

Right, no hard power off. If had, I would never have posted the initial question, because that would have been the reasonable and logical scenario of course. The only significant consumer of resources, is pigpiod which bouncing around 8% at times. Even node.js can jump a few % points as I have other flows running, but short of the bad loop I had at the time, node.js would run typically at 2 to 4 % most of the time on the Pi3. The SD card in use at the time is still in the Pi3 and still functional. I have gone on to develop more flows on the same Pi device, with no issues thus far, other than typical flow debug efforts if you will.

As a counter point, I have been able to bury a PiZero at times, but that is to be expected given its resource limits. Even the PiZero image I have, has several flows running routinely, and other than memory consumption runs pretty high, the processor load is not extreme.

For example...

Tasks:  70 total,   1 running,  69 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.5 us,  2.4 sy,  3.1 ni, 88.2 id,  0.0 wa,  0.0 hi,  0.7 si,  0.0 st
MiB Mem :    479.4 total,    151.5 free,    113.1 used,    214.8 buff/cache
MiB Swap:    100.0 total,    100.0 free,      0.0 used.    305.1 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  310 root      20   0   12680   1832   1528 S   8.4   0.4 959:48.05 pigpiod
  575 pi        25   5  192264 109912  29092 S   2.3  22.4 245:43.05 node
32598 root      20   0   10184   2956   2484 R   1.6   0.6   0:00.47 top
    7 root      20   0       0      0      0 S   0.3   0.0   5:19.61 ksoftirqd/0
  160 root      20   0       0      0      0 S   0.3   0.0   4:48.55 brcmf_wdog/mmc1
  459 Debian-+  20   0   24604   9172   7104 S   0.3   1.9  34:31.78 snmpd
21385 root      20   0       0      0      0 I   0.3   0.0   0:32.60 kworker/u2:1-brcmf_wq/mmc1:0001:1
32502 root      20   0       0      0      0 I   0.3   0.0   0:00.02 kworker/0:2-events
32538 root      20   0   12360   6364   5476 S   0.3   1.3   0:00.68 sshd

In that case it suggests that whatever is pushing up the load above 100% must be invoked inside the loop. In the loop you posted you had node.warn statements. Since those write to the log they write to the SD card, so it is things like that which might be involved.

Ah, that is a good point, that actual observation of the issue, adds to the impact of issue (indirectly). Nice. Classic example of the 'Observer Effect' potentially. Did I mention I was a very good Physics student, when I was still in school? A very long time ago! LOL.

Just a footnote... This issue may finally be core to a specific Pi3 device, although I was able to drive CPU loading on a different Pi3 and even a Pi4, I was never able to get beyond a few percent above 100, no matter how I tried to abuse the node process, via various test flows.

Here is the interesting part, now, this specific Pi3 that trigger this entire thread, is behaving oddly specific to node process, running at 100 percent and well above, 110 or more, with no active flows, even after a full restart/reboot, not just a NR service restart.

The NR log shows nothing that would suggest this behavior this far as well. The next step is that I will remove and reinstall NR, then if needed replace the image of the SD, and even then if needed replace the SD card. Really interesting issue at the moment.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.