I/O switching performance on a Raspberry Pi 4

I am trying to benchmark I/O performance on a number of embedded platforms (Arduino, Pi, Beaglebone). Currently Node-RED on the Pi 4 is giving me very low numbers (under 2,000 transitions per second) and I am not sure if it is my flow or that Node-RED is not really built to do this sort of thing well. In comparison doing the same thing under python the number is 1/2 a million transitions per second.

Hardware:
It is basically set up as an LED blink but with an additional Digital input wire to confirm the LED state is correct. Let me know if you want to know why we do this.

Software:
The structure is

  1. initialise, get start time in milliseconds
  2. Loop toggling the LED and confirming the pin state
  3. grab end time and calculate the number of transitions per second

I was expecting Node-RED to be slower than python but the gap makes me think I am misunderstanding something. I have turned off "Show node status" thinking that might increase load between the IDE and the Pi.

Any help would be much appreciated.
David

JSON for my flow is:
[{"id":"1ae67599.25720a","type":"function","z":"82f1d13f.bdbbc","name":"Measure Test","func":"var MTEndTime = new Date().getTime();\nflow.set("EndTime",MTEndTime)\nvar Duration = MTEndTime - flow.get("StartTime");\nmsg.payload = Duration;\nmsg.speed = 5000 / Duration * 1000;\nflow.set("Duration",Duration);\nreturn msg;","outputs":1,"noerr":0,"x":560,"y":340,"wires":[["7542c685.c89c28","d06d23bb.e2c868"]]}]

Picture of my flow

Please read How to share code or flow json first, then select all nodes on your flow that you would like to share, and export it. The code you've here is only the function node named 'Measure Test'.

When you say you are trying to measure I/O performance, what you are actually measuring is the overheads in the code used to read the I/O, not the I/O itself. Node red will always be vastly slower than Python code reading the pin directly. So I am not sure exactly what you are trying to measure with your tests.

Attached actual flow.My Mistake
Node-RED-Pi4-IO-Benchmark.json (2.9 KB)

Hi Colin,
I am trying to determine the fastest possible response time under Node-RED so I can understand what it can be used for and what it cannot. I plan to run the same flow on a number of BeagleBones as well.

Most specifically I want to understand how quickly I can expect Node-RED to query a sensor (the faster the better).
From there I can either measure or estimate max load on messages, etc.
I get that Node-RED can be great for rapid code development but I need to understand what it's strengths and weakness are to understand how I can use it. Unfortunately I know nothing about it and this is my first attempt. I would like to know if I have messed up the implementation.

If the eventual answer is that Digital I/O doesn't work well but IP messaging is really good then I have accomplished my goal.

1 Like

That isn't what the linked post said to do to share your post.

1 Like

I see you are using a non-standard node (the counter loop node), is that a good idea if you are benchmarking? What is it? node-red-contrib-something probably.

@Colin it is node-red-contrib-loop-processing

I suspect that @Colin is right, but it could be worth trying to make a more direct comparison. Your python program may well be using the same library to communicate with the hardware as the NR gpio nodes. In that case, looking directly at the gpio pin with an oscilloscope or frequency counter while toggling it in a tight loop first in NR then in python might give you an idea of the difference in overhead for each method of calling the library.

1 Like

I followed the link read the thread and went with the 2nd post in that thread by Paul-Reed. I now see you wanted only the first option so sorry about that.

Hi @drmibell. I was hoping for something more vaguely real world hence the approach. My goal is to see what the usable rate is. Some coding but where the IO is the predominate factor. As I mentioned initially I have done this on Arduino, BeagleBone and Pi all with the same basic structure.
Such that the average number of write and read's pairs per second is:
90,000 Arduino Uno
225 Galileo (I wonder why it was never popular). Caveat, using fast IO methods you could get closer to 20K
510,000 Digilent Uno32
30,000 Pi A (could get 35K if you overclocked). Pi B was similar but overclocking got you almost 55K
218,000 Pi 3B+
530,000 Pi 4B
30,000 Pi Zero wireless
10,000 Beagle Bone Black Wireless (haven't done the PRU's yet) Also should have AI and Pocket results soon.

So as I have a reasonable result set already changing the methodology is not all that desirable. Plus at the end of the day I just want to know what I can use each device for.

So Pi Zero and temp sensor is great, using SPI to get results from moderately fast DAC or ADC is not so good.
The Pi 3 and 4 are getting into the territory of being able to acquire data or drive something sub 100KHz depending on how complex your code is.
Once you move to the MHz range you have pretty much left all of these behind (think FPGA) except maybe the BeagleBone (using the on board PRUs).

In comparison I did use LabVIEW Lynx (4 or so years ago) on the Arduino Uno, Digilent Uno32 and Max32 and results were 115, 30 and 195 respectively.

1 Like

Hi Colin, @zenofmud is correct. That is the package. My thinking was to have something more Node-RED and GUI based rather than just coding the loop in a Function block. If this is a major performance hit because the package is poorly implemented then that could be a problem but it does look like the major issue is I/O performance (I am more than 2 orders of magnitude difference from the python case at the moment).
And that is the point of my question. I just want to make sure I haven't done something stupid and made the results way worse than they should be. If the results look vaguely correct then I have a reasonable answer.
Question though. When I trigger the test it takes a couple of seconds to do 5,000 cycles but the web interface takes maybe another 5 seconds before it can be used (ie, query flow variables). Is this expected?

Many thanks for all your help so far.

I guess I misunderstood. Your original post seemed to be asking why NR performed so much more poorly (a factor of 200) than python on a Pi 4 in your application. That struck me as a large difference that could possibly be reduced. If what you really want to do is compare hardware platforms, that's fine too, but I would be a bit cautious drawing conclusions.

Forgive me if this is obvious, but with any benchmark written in a high-level language, differences between platforms in how the software stack is implemented can matter a lot, and a single benchmark can be misleading, even if it simulates a task you want to perform.

What s important here? A high I/O performance or the use of NR?

If high I/O performance is what you are looking for, why not C? Or since you seem familiar with Python, use Cython

1 Like

@krambriw I want to understand what Node-RED can do well.
I did think going into this that it would probably have less performance than the python but the size of the difference was surprising.
My next step is moving the same flow to BeagleBone and see if it is an IO implementation thing as the BB uses a different library.
After this I'll do something with messaging.

I am not convinced that much of the overhead is in the I/O. I suspect that if you replaced the IO with an inject node to input the values then you would get a similar result. I suspect it is more to do with the overheads of nodejs, javascript and the node red model. That is always going to be orders of magnitude slower than tightly written python performing a dedicated task.

I hope you are not running the browser on the Pi, otherwise you may well be be measuring the processor loading imposed by the browser rather than what is going on in node-red.

1 Like

Hi @drmibell,
The performance difference was exactly my question.
In terms of differences between platforms and languages that is why I am doing this. I want to know with some simple testing what platforms and languages are usable for in this case Digital I/O. This is not an academic analysis I'm looking at the numbers I have and it is pretty obvious that once the Pi went multicore it's IO performance improved massively. The BeagleBone's are still mostly single core (except AI) and they are not great performers but they have many more pins + they have 2 pretty fast microcontrollers on die (PRU).
So sure C may be faster than Node-RED but if it was only 50% then I would much prefer to use Node-RED. The performance question is both a hardware and software choice and cost of the hardware and time to write, debug and expand the code over time is what will drive those sort of decisions. If I really want performance I would be looking C code on a BeagleBone PRU and beyond that System C or Verilog on the FPGA portion of a Zync based board.
My contention is that Node-RED code would be easier to understand, maintain and expand than corresponding C and even python code (but that may be just me).

Might be that one bottleneck for these type of heavy "things" is that NR is single threaded. When you run Python directly, do you know if the library "under the hood" might use multi threading or multi processing to increase performance?

I would assume also the web server in NR represents some kind of load even if the browser is requesting data from another machine?