Dedicated machine for node

Hello everyone,
I'm here today for a maybe unusual request. The company I work in uses many node-red flows to retrieve data from various sources, mainly via executing Puppetteer scripts. All those flows are currently sitting on our server, but since their number is constantly increasing, we decided to move the workspace on a dedicated server/machine (still to decide).
I am not an expert of hardware and specifics, so I'm seeking for an help to know what we should need: currently on our server what we have is basically a CPU issues: since many flow are run once every few minutes, sometimes the CPU spikes to 100%, causing trouble.

What should be the specifics for a dedicated machine for running node-red (which will run various puppetteer scripts)?

Also, is it possible to make node use the GPU instead of the CPU, to improve the performance of the machine since the CPU is suffering?

I am sorry if I was not clear, again I am not an hardware expert, so feel free to correct me or question me about details.

Thanks a lot everyone in advance for your kind help

It sounds like you may need to start thinking about scaling out horizontally & distribute workloads. It doesn't have to be a daunting task - since this is for your company, you might appreciate profession solution build for enterprise? The company i work for offers this - you might wish to check it out?

1 Like

What are you currently running on?

Thanks I will give it a look right now!

You have a number of ways of dealing with a bottleneck like this and it is hard to give generic advice given little background.

I see that you are using the AWS cloud service and so you could scale vertically by increasing the virtual server size. Though with Node-RED being a node.js based system, that won't always help because node.js is mostly a single-threaded system. This is also why a GPU won't help.

Steve mentions horizontal-scaling which means spreading the load across multiple instances of Node-RED. You could do this by splitting flows, or by splitting by connection (e.g. the same flows but limted to different incoming data streams). You can even do that 2nd option using a proxy to enforce a round-robin style split though that might be rather complex for Node-RED.

Horizontal scaling also benefits from having tooling to scale out. So using AWS templates or running Node-RED in a container and using Kubernetes or similar.

You can also run FlowFuse on AWS (self hosted too) - all of the hard work is done for you.

It is available in the market place: AWS Marketplace: FlowFuse

@zackwasli can guide you if needed.

I don't think the problem is the workload on node-red specifically. What most of our flows do is simply executing a .js script running puppetteer: I think the script itself is what's taxing for our CPU, opening chromium browsers and performing various operations. This is why we opted for a separate workstation dedicate for node-flows only: the idea is to have it on a physical workstation in the office, to avoid unnecessary server-related costs. This is why I am asking for what would fitting hardware specifics be!

A better question might be - why do you use puppeteer?

Is the data you are accessing not available via APIs or DB access or some other more consumable format?

IMO, puppeteer is last chance saloon.

PS: puppeteer is definitely one of the more demanding things to do in Node - you are literally firing up browsers (which are known to be a magnitude more power hungry than a simple DB query or HTTP call)

1 Like

When I can I'm using APIs, when I can't use APIs I am using other ways to retrieve the data (http calls or such). Sometimes this was not possible (we're talking about 2 or 3 scripts), and puppetteer was indeed my last chance saloon (love the expression!!).

2 Likes

Are you running multiple instances of puppeteer at the same time? If so, can you run them sequentially instead?

Yes, sometimes I run them at the same time. Some flows take place every X minutes or at specific times, based on when the data is updated: sometimes one or more flows running puppetteer may run at the same time

Are you instantiating pupetteer for each page you want to access? It might be much more efficient to run up one instance and then use it to access each page sequentially as required.

I am doing the latter! Navigating to a first page and then using the same one for each operation

There are far better tools to use when needing to programmatically capture a page from a server. Puppeteer is designed for website testing with people really.

Instead, use a node.js library designed to grab the content from web pages.

If necessary, run the grab process in a separate dedicated node.js microservice which you can easily spin up programmatically to deal with request spikes.

No need to use a pretend "person" to do this.

I usually use simple HTTP request to retrieve the data, but sometimes it seems to block the access (like captcha controls or similar), and the only way I could figure to make it work was using puppetteer istances. I am kinda new to this stuff so I might be missing something! But also some of the scripts were here before I was, so I took inspiration from those pre-existing

OK, I think you might have taken yourself down a slight side-track.

Presumably, you are only running Puppeteer queries when you really have to?

But in any case, you can certainly continue to run those on a server. Personally, by the way, I'd choose to run Node-RED and any web-scraping scripts on a Linux server as you will certainly get more bang for your buck than using a Windows Server as you appear to be doing (though obviously, that somewhat depends on what support skills your organisation has).

We also don't know how time-critical the running of these queries are. If they aren't time critical, you could consider spinning up a Node-RED instance when needed and then shutting down again. This is likely to keep your AWS costs lower.

I'd still opt to run the queries in a separate microservice myself though - that would be really easy then to manage including faster startup times if you decided to only fire them up when needed.

Our (only) server is a Windows server, and we have little to none knowledge of linux servers...

What do you mean with time-critical?
Some of the flows need to be run every few seconds (those use simple http calls)
The puppetteer ones need to be run once every 3/4 minutes or once/twice a day.

By the way, we were considering buying a "physical machine" to keep in our office for all the "node related" stuff to run, also to keep it separated from the rest (our server is where Ignition is installed, so we can partition stuff)

1 Like

Just to add to this.

If you need to run Puppeteer on FlowFuse, you will probably need to build custom FlowFuse Stack Docker Container that includes the browser.

Instructions for building custom containers are available here:

Does it matter if it takes 60sec to run the query?

In that case, you could get away with spinning up a VM for the 2nd type, probably not worth it for the first.

If you are going to "do cloud", I'd strongly recommend sticking with it. Trust me, you will regret it if you put something in your dusty office (yes all offices are dusty no matter how well they are cleaned) and have to constantly maintain it.