Floating point calculation speed

I thought Node.js, and hence node-RED was single threaded, so like me, can only do one thing at once!

So platform is limited to raspberry pi possibly the new 4.
@Paul-Reed I think this will be the downfall as the loop is single thread. Think it will run in to issues.
I'll finish the code and give it a go see how it performs.

Are you saying you can do this on a pi 3 when it is written in C? What percentage of the CPU is that using?

@Colin no it's just an assumption that C would process faster, and seen similar code programmed in C but processed through a Arduino due.

OK, so you don't know whether your hardware will perform the job in C either, and I rather doubt whether a Pi will cope with that. The only way to be certain is to benchmark it. The floating point itself should not be an issue as javascript with nodejs should use hardware floating point exactly as C will. It is the surrounding logic that may be a lot slower in js.
Also of significance is whether you can tolerate variability in how quickly the calculation is performed. Being able to handle 400,000 calculations/sec on average is very different to being able to guarantee that each calculation is performed within 2.5 microseconds of the data arriving.

Unless you have a library that will offload the processing to the GPU, I suspect that a Pi might struggle. Though honestly, like others, I don't know exactly how much work is involved.

Don't forget the new Pi4 as well. That is a big step up from the Pi3.

You can now also get Node.js to be multi-threaded, there are certainly libraries to do this and you can use worker threads as well.

Of course, you could also call out to your theoretical C programme as well - just using Node-RED to do the orchestration.

Certainly, JavaScript is not generally the best language for doing high-speed floating point calculations but as the others have said, sometimes you get a pleasant surprise. There might also be an npm package that would help, worth looking.

And you can of course write bindings for C/C++ into node.js .


1 Like

You can also compile the C/C++ to WebAssembly using Emscripten, and load the WebAssembly module I believe.

You could potentially do this but with Node.js it shouldn't be necessary since, as Dave says, you can already integrate C code.

You would think that would be the case, but no. In some instances using wasm can be faster. More importantly, the binding code for emscripten is less than what's needed to use C++ directly. Here's a good example: https://medium.com/netscape/javascript-c-modern-ways-to-use-c-in-javascript-projects-a19003c5a9ff

Thanks for all the input I'm going to code and bench test, after a little research and still not full understanding, so I'll ask is node- red (JavaScript) interpreted or compiled, this was the basis of speed comparison between thus and C.
Also going to sub some of the code to a second processing unit, with node red or node.js being single threaded I'm guessing that, that is where it only utilise 1core of CPU?

You can read everything about javascript in google (btw the "interpreter" runs only once, when loading the code).

The big difference between an arduino and a raspberry is the os. If you have a time frame of only 2,5µs, you have to consider all the processes running also on the machine...(keyword realtime os)

Why not just code it and try it out?

1 Like

It's a bit both I suppose. The V8 engine has a sophisticated JIT compiler. In some tests, it outperforms even relatively low level languages (depends on many factors of course).

As others have said,. I think this is one you'll have to try out.

And there is always wasm to get near C speeds. Not certain how you'd implement that but please report your findings and code for others to benefit :slight_smile:

1 Like

So, I have coded up what I need, and been playing about and with 6 instances running at 0.2500 intervals I was hitting around 20 to 24 percent of my CPU on a raspberry pi 2 b overclocked at 1000mhz.
the problem I think I am getting is lagg is it possible the msg's are bottlenecking?
I tried using the multithread function but unable to gain access to context items.
my coding skills are probably not the cleanest so, may be able to get better results by cleaning them up.

I am thinking of creating a table for COS and SIN and instead of doing the calculation just look up the required figure, would this be a faster option?

Body Rotation Matrix

var l1x= global.get("X");
var l1y= global.get("Y");
var l1z= global.get("Z");
var Yrot= global.get("Yr")*Math.PI/180;
var Xrot= global.get("Xr")*Math.PI/180;
var Zrot= global.get("Zr")*Math.PI/180;
//var cos= Math.cos(angle);

// X Axis rotation

var Y1 = l1y*Math.cos(Xrot)-l1z*Math.sin(Xrot);
var Z1 = l1z*Math.cos(Xrot)+l1y*Math.sin(Xrot);

// Y Axis rotation

var X1 = l1x*Math.cos(Yrot)-l1z*Math.sin(Yrot);
var Z2 = l1z*Math.cos(Yrot)+l1x*Math.sin(Yrot);

// Z Axis rotation

var X2 = l1x*Math.cos(Zrot)-l1y*Math.sin(Zrot);
var Y2 = l1y*Math.cos(Zrot)+l1x*Math.sin(Zrot);

flow.set("z1",(Z1 + Z2 - (l1z + l1z)));
flow.set("x1",(X1 + X2 - (l1x + l1x)));
flow.set("y1",(Y1 + Y2 - (l1y + l1y)));

//var msg1 = { payload:X };
//var msg2 = { payload:Y };
//var msg3 = { payload:Z };

//return [msg1,msg2,msg3];

Inverse Kinematics

//Global Sets
var L1 = global.get("L1");
var L2 = global.get("L2");
var L3 = global.get("L3");
var Z0 = global.get("Z0");
var X0 = global.get("X0");
var Y0 = global.get("Y0");
var A = global.get("A");

//Inflow Sets
var L4s = flow.get("L4");
var L5s = flow.get("L5");
var cts = flow.get("ct");
var cts1 = flow.get("ct1");
var cts2 = flow.get("ct2");
var cts3 = flow.get("ct3");
var cts4 = flow.get("ct4");
var cts5 = flow.get("ct5");
var cts6 = flow.get("ct6");
var PHIs = flow.get("PHI");

//Gamma Rotation
var st = Math.atan(Z0/X0);
var GR = st*180/Math.PI;

//L4 calculation

//B Calculation
var st1 = (Math.pow(L4s,2)+Math.pow(X0,2)-Math.pow(Y0,2))/(2*L4s*X0);
var mt = Math.acos(st1);

//C Calculation
var st2 = (Math.pow(L4s,2)+Math.pow(Y0,2)-Math.pow(X0,2))/(2*L4s*Y0);
var mt1 = Math.acos(st2);

//J1 combonation result.
var T1 = Math.round(cts2 + cts4 + cts1 - 90);

//L5 Calculation

//D Calculation
var st3 = (Math.pow(L5s,2)+Math.pow(L4s,2)-Math.pow(L3,2))/(2*L5s*L4s);
var mt2 = Math.acos(st3);

//E Calculation
var st4 = (Math.pow(L5s,2)+Math.pow(L3,2)-Math.pow(L4s,2))/(2*L5s*L3);
var mt3 = Math.acos(st4);

//PHI Calculation

//J2 combonation result.
var T2 = Math.round(180 - cts6);

//F Calculation
var st5 = (Math.pow(L1,2)+Math.pow(L5s,2)-Math.pow(L2,2))/(2*L1*L5s);
var mt4 = Math.acos(st5);

//G Calculation
var st6 = (Math.pow(L2,2)+Math.pow(L5s,2)-Math.pow(L1,2))/(2*L2*L5s);
var mt5 = Math.acos(st6);

//H Calculation
var st7 = (Math.pow(L2,2)+Math.pow(L1,2)-Math.pow(L5s,2))/(2*L2*L1);
var mt6 = Math.acos(st7);

//J3 combonation result
var T3 = Math.round(-180 + cts5 + cts3);

var msg1 = { payload:GR };
var msg2 = { payload:T1 };
var msg3 = { payload:T2 };
var msg4 = { payload:T3 };

return [msg1,msg2,msg3,msg4];

Also, I was using Math.round on some of my calc's but this takes the number to the nearest whole number how can I do it so that it reduces it to 2 decimal places?

Quick performance tips. FP multiply is faster than a divide (so precalc 1/180 or pi/180 once). And doing x*x is faster than math.pow(x, 2)

1 Like

If you really need rounding then probably fastest way will be
Math.round(num * 100) / 100 but as you see it again adds another set of calculations to do so ...

So scrap Math.pow and just use conventional A * A, and when you say precalc radian to degree should that be put into a var PI = and then just call it when needed?
was considering just radians throughout as I think the servo control board can input radians.
@hotNipi Yeah its probably not needed to be fair the only reason I was going to covert it was to reduce the number length, but if it's going to have a negative effect on processing time them maybe leave it, but at least I know how to do it now.

Some other tips you may find useful
You can reduce variable creation. (f.e GR in inverse kinematics seems to not used in other calculations so put it straight into outgoing payload )
You can store and read multiple variables in context in one go
Also there is https://flows.nodered.org/node/node-red-contrib-unsafe-function. Claims to be fast and furious. Don't know details but may do the the thing for you. Worth a try.

1 Like

@hotNipi just tried the unsafe function and that reduced the CPU below 10% so shaved off a massive amount, along with a raspberry pi 4 that should decrease more.

You probably should also avoid repeated calls to expensive math functions. For example, in the X-axis rotation matrix,

var cosx = Math.cos(Xrot);
var sinx = Math.sin(Xrot);
var Y1 = l1y*cosx-l1z*sinx;
var Z1 = l1z*cosx+l1y*sinx;

could be up to twice as fast as what you are doing now. It can be tempting (and more readable) to just code up the textbook equations, but if performance is an issue, think like a computer not a human.