Node-RED on Raspberry Pi OS x64 WORKS!

So I just installed the new Raspberry Pi OS x64 on my Raspberry Pi 3B+ and I'm surprised how good it's working already and how fast it is on my Pi 3B+, how cool is that? Okay you do have to install Node-RED manually but you can use the same installation bash script that you'd use when you install it on the 32-bit version.

It's this one to be exact:

bash <(curl -sL https://raw.githubusercontent.com/node-red/linux-installers/master/deb/update-nodejs-and-nodered)

The new Raspberry Pi OS x64 doesn't have Node-RED installed by default so just use the bash script to install Node-RED and everything just works, I had no warnings, nothing abnormal except that it's way faster to finish.

Another nice thing that I saw that's new is that you now get the recommendation to create the settings file immediately after the setup is done, I highly recommend to make use of it also.

It's super easy now to get Node-RED up and running within 10 minutes at most complete with settings file, starting up the service automatically at boot and all.

I just wanted to share this with everyone that it even works on my old Raspberry Pi 3B+ so I would like to encourage everyone to update the OS to the 64-bit version and get Node-RED running.

I'm very excited that it just works the first time and I can't wait to work on a new project and please share if you've got the x64 OS with Node-RED running already, I'm curious if someone else already tried it.

3 Likes

And how much faster is it would you say?

1 Like

That is interesting, I would not have expected it to make any significant difference. Are you sure you are not using a faster SD card than you did previously?

1 Like

I find speed statements interesting about 64 bit OS. Everyone always points to all you get is better memory handling but that's not all by a long shot. Someone in another thread posted several links to explanations about it and it is a huge difference. Moving from 32 bit bus to 64 bit bus I would expect a doubling of speed and those articles pretty well confirmed that. Plus better caching, and all sorts of under the hood optimization. 64 bit should make a huge difference. About a year ago I tried 64 bit and nothing ran, too early in the OS development. I'm going to give it another go now though, can't wait to see for myself.

1 Like

Can you provide a link to that please? I can't seem to find a link that suggests that.

1 Like

x64 also feels really nice on a RPi4. I cannot really say I can compare in a professional way but I migrated a solution running on RPi3 to a RPi4 with x64 and now it is really fast, especially noticed when loading the flow in the browser (mainly due to RPi3 to RPi4 of course). Anyway, no problems noticed when running Node-RED (2.2.2), installation using the script worked without problems

1 Like

Haven't tried this before hope link works, look at very bottom

1 Like

I can't see anything in those links that suggest that there should be anything like a doubling in speed.

1 Like

Try this one, you have to scroll a few pages but 48% speed increase
[Edit] this one specifically says 64 bit is the way to go

1 Like

Far from double speed

1 Like

Also, that is related to processor limited benchmarks, which ties up with what is said in the original post, that the improvement is mainly seen in benchmarks. In practice, in the sort of real world applications we usually deal with here, it is rare for processes to be CPU bound. So there may well be only a very small improvement. The one application that comes to mind where it may make a significant difference might be in video or image processing. If, on a 32 bit system handling video, it is seen that the processor regularly runs close to 100% running such a task then certainly the 64 bit OS may make a useful difference. Even so, I would only expect up to about 30% improvement I think.

1 Like

This is something I think I will be able to check. Running the YOLO video analyzer for object detection on a RPi4 (32 bit) is a very tough task. I will try to run the same on the other RPi4 (64 bit) and see what the difference is

1 Like

If the files are coming from SD card make sure it is the same speed of card.

1 Like

I will send the image data from a separate computer to the python YOLO processes running on the respective RPi4 via MQTT

1 Like

Yes, far from double, I saw 50% and got all excited. Mea Culpa. However, 48% is still a very noticeable improvement. I'm sure 64 bit would bring real world improvements just like moving from 8 bit to 16 bit to 32 bit. If you don't want to use 64 bit it's up to each individual but I'm going to start moving all my stuff over.

1 Like

I was not in any way suggesting that going to 64 bit is a bad idea. I was just saying that we should not expect huge performance benefits. I won't be moving many of my existing stable systems across as they are working perfectly well so there is no point. There is one that streams video and using Motion and ffmpeg and at times shows heavy processor utilisation, so I may well have a go at that one (all the time keeping the old SD card in case I have problems). I will not be at all surprised to find that some stuff has problems, which may take some sorting out.

1 Like

So I have done some tests, and yes, a nice improvement for the x64! Analyze is some 30% faster in the x64!!. Using CPU's, no GPU's involved

Both RPi4 w 4BG versions,
32 bit

[INFO] YOLO took 5.527635 seconds
2022-03-20 16:53:48 [INFO] Person: 33.90% 6424.00 22

64 bit

[INFO] YOLO took 3.926258 seconds
2022-03-20 16:53:47 [INFO] Person: 33.90% 6424.00 22
2 Likes

Thanks for testing that.
So certainly worth changing over on any systems that do have significant heavy duty processing to do.

1 Like

Sorry for the late answer, I have been benchmarking my Raspberry Pi 3B+ with the 64-bit OS and I see an overall improvement of about 25% running a stock config, after overclocking I get even more performance out of it maybe up to about 50%, with exactly the same settings as I was using with the 32-bit OS since I know that my settings run stable and continuously and the temperature of the CPU never gets above 55 degree Celsius even during benchmarking for 30 minutes it will stay below 55 degrees.

I'm keeping my Raspberry Pi cool with custom made heatsinks on the bottom memory chip I have a copper plate mounted, I also have a tiny heatsink on the topside directly on the network controller chip and another one one the chip next to it and I have an as big as possible heatsink on top of the CPU which also covers the wireless connection part, I made sure that I didn't short anything with the oversized heatsink and I modified the plastic case, basically I cut a big hole out of the top part and mounted a 12V computer fan on top at half the power blowing directly on top of the pcb and that works like a charm.

No need for expensive cooling towers or water cooling, just a simple pc fan is enough with the custom heatsinks to keep it below 55 degrees.

This is a long post with benchmark results for those who are interested

Benchmarks are done with the latest version of sysbench on my Raspberry Pi 3B+ overclocked to 1500MHz.

Just to show what's possible I copied the output of the benchmark tests that I did below, I also added the commands that I used to set up sysbench, it may be useful because it changed quite significantly.

The first benchmark test I did is testing threads and limited it to 10,000 events as you can see below:

sysbench --threads=4 --time=0 --events=10000 --debug=on --validate=on threads run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Debug mode enabled.

Validation checks: on.

Initializing random number generator from current time

Doing thread subsystem performance test
Thread yields per test: 1000 Locks used: 8
Initializing worker threads...

DEBUG: Worker thread (#0) started
DEBUG: Worker thread (#0) initialized
DEBUG: Worker thread (#1) started
DEBUG: Worker thread (#1) initialized
DEBUG: Worker thread (#3) started
DEBUG: Worker thread (#3) initialized
DEBUG: Worker thread (#2) started
DEBUG: Worker thread (#2) initialized
Threads started!

Event limit exceeded, exiting...
(last message repeated 3 times)
Done.

General statistics:
    total time:                           15.9623s
    total number of events:      10000

Latency (ms):
         min:                                    5.78
         avg:                                    6.38
         max:                                   68.81
         95th percentile:                  7.70
         sum:                                   63807.90

Threads fairness:
    events (avg/stddev):                2500.0000/33.59
    execution time (avg/stddev):   15.9520/0.00

DEBUG: Verbose per-thread statistics:

DEBUG:     thread #  0: min: 0.0058s  avg: 0.0063s  max: 0.0603s  events: 2528
DEBUG:                  total time taken by event execution: 15.9572s
DEBUG:     thread #  1: min: 0.0058s  avg: 0.0064s  max: 0.0537s  events: 2504
DEBUG:                  total time taken by event execution: 15.9539s
DEBUG:     thread #  2: min: 0.0058s  avg: 0.0063s  max: 0.0688s  events: 2524
DEBUG:                  total time taken by event execution: 15.9514s
DEBUG:     thread #  3: min: 0.0058s  avg: 0.0065s  max: 0.0509s  events: 2444
DEBUG:                  total time taken by event execution: 15.9454s

Then I did a benchmark for the CPU with the limit at 100,000 events, results below:

sysbench --threads=4 --time=0 --events=100000 --debug=on --validate=on cpu run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Debug mode enabled.

Validation checks: on.

Initializing random number generator from current time

Doing CPU performance benchmark

Prime numbers limit: 10000

Initializing worker threads...

DEBUG: Worker thread (#0) started
DEBUG: Worker thread (#0) initialized
DEBUG: Worker thread (#1) started
DEBUG: Worker thread (#1) initialized
DEBUG: Worker thread (#2) started
DEBUG: Worker thread (#2) initialized
DEBUG: Worker thread (#3) started
DEBUG: Worker thread (#3) initialized
Threads started!

Event limit exceeded, exiting...
(last message repeated 3 times)
Done.

CPU speed:
    events per second:  2761.26

General statistics:
    total time:                                   36.2100s
    total number of events:              100000

Latency (ms):
         min:                                    1.33
         avg:                                    1.45
         max:                                   31.55
         95th percentile:                  1.52
         sum:                                   144739.97

Threads fairness:
    events (avg/stddev):               25000.0000/149.42
    execution time (avg/stddev):   36.1850/0.01

DEBUG: Verbose per-thread statistics:

DEBUG:     thread #  0: min: 0.0013s  avg: 0.0014s  max: 0.0270s  events: 25033
DEBUG:                  total time taken by event execution: 36.1891s
DEBUG:     thread #  1: min: 0.0013s  avg: 0.0014s  max: 0.0316s  events: 25111
DEBUG:                  total time taken by event execution: 36.1722s
DEBUG:     thread #  2: min: 0.0013s  avg: 0.0015s  max: 0.0247s  events: 24747
DEBUG:                  total time taken by event execution: 36.1901s
DEBUG:     thread #  3: min: 0.0013s  avg: 0.0014s  max: 0.0254s  events: 25109
DEBUG:                  total time taken by event execution: 36.1886s

Because the above test are done so fast I did run two more tests that ran a bit longer to get the temperature up to the highest point which is 47 degree Celsius when I run the threads test, when I run the normal CPU performance test I don't see the temperature go higher than 42 degrees.

2nd threads test with the limit set to 40,000 events, results below:

sysbench --threads=4 --time=0 --events=40000 --debug=on --validate=on threads run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Debug mode enabled.

Validation checks: on.

Initializing random number generator from current time

Doing thread subsystem performance test
Thread yields per test: 1000 Locks used: 8
Initializing worker threads...

DEBUG: Worker thread (#0) started
DEBUG: Worker thread (#0) initialized
DEBUG: Worker thread (#1) started
DEBUG: Worker thread (#1) initialized
DEBUG: Worker thread (#2) started
DEBUG: Worker thread (#2) initialized
DEBUG: Worker thread (#3) started
DEBUG: Worker thread (#3) initialized
Threads started!

Event limit exceeded, exiting...
(last message repeated 3 times)
Done.

General statistics:
    total time:                                   63.8808s
    total number of events:              40000

Latency (ms):
         min:                                    5.72
         avg:                                    6.39
         max:                                  151.23
         95th percentile:                  8.74
         sum:                                   255445.60

Threads fairness:
    events (avg/stddev):               10000.0000/68.94
    execution time (avg/stddev):   63.8614/0.00

DEBUG: Verbose per-thread statistics:

DEBUG:     thread #  0: min: 0.0057s  avg: 0.0063s  max: 0.0567s  events: 10096
DEBUG:                  total time taken by event execution: 63.8596s
DEBUG:     thread #  1: min: 0.0057s  avg: 0.0064s  max: 0.0829s  events: 9919
DEBUG:                  total time taken by event execution: 63.8650s
DEBUG:     thread #  2: min: 0.0057s  avg: 0.0064s  max: 0.0694s  events: 9953
DEBUG:                  total time taken by event execution: 63.8559s
DEBUG:     thread #  3: min: 0.0057s  avg: 0.0064s  max: 0.1512s  events: 10032
DEBUG:                  total time taken by event execution: 63.8650s

2nd CPU performance test with the time limit set at 300 seconds (5 minutes), results below:

sysbench --threads=4 --time=300 --debug=on --validate=on cpu run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 4
Debug mode enabled.

Validation checks: on.

Initializing random number generator from current time

Doing CPU performance benchmark

Prime numbers limit: 10000

Initializing worker threads...

DEBUG: Worker thread (#0) started
DEBUG: Worker thread (#0) initialized
DEBUG: Worker thread (#1) started
DEBUG: Worker thread (#1) initialized
DEBUG: Worker thread (#2) started
DEBUG: Worker thread (#2) initialized
DEBUG: Worker thread (#3) started
DEBUG: Worker thread (#3) initialized
Threads started!

Time limit exceeded, exiting...
(last message repeated 3 times)
Done.

CPU speed:
    events per second:  2752.74

General statistics:
    total time:                          300.0011s
    total number of events:              825837

Latency (ms):
         min:                                    1.33
         avg:                                    1.45
         max:                                   34.85
         95th percentile:                        1.61
         sum:                              1199254.28

Threads fairness:
    events (avg/stddev):           206459.2500/346.86
    execution time (avg/stddev):   299.8136/0.01

DEBUG: Verbose per-thread statistics:

DEBUG:     thread #  0: min: 0.0013s  avg: 0.0015s  max: 0.0336s  events: 205927
DEBUG:                  total time taken by event execution: 299.8217s
DEBUG:     thread #  1: min: 0.0013s  avg: 0.0015s  max: 0.0296s  events: 206476
DEBUG:                  total time taken by event execution: 299.8171s
DEBUG:     thread #  2: min: 0.0013s  avg: 0.0015s  max: 0.0348s  events: 206537
DEBUG:                  total time taken by event execution: 299.8139s
DEBUG:     thread #  3: min: 0.0013s  avg: 0.0014s  max: 0.0307s  events: 206897
DEBUG:                  total time taken by event execution: 299.8017s

2 Likes