Is it possible to use Cheerio by directly importing it into a function node?

Is it possible to use Cheerio by directly importing it into a function node? I tried adding the module and deploying the flow just to test it, but an error occurred:

function : (error) "Error: Cannot find module '/data/node_modules/cheerio/index.js'. Please verify that the package.json has a valid "main" entry"

The first question is, why?

The function node operates server side (i.e. in node) not client side (i.e. the browser) where cheerio does its work.

What are you trying to achieve?

Thanks @Steve-Mcl.

My intention is to get not only certain HTML elements from a URL, but also specific attributes of the HTML tags. E.g. data attributes, href, etc.

If I try to use the 'node-red-contrib-cheerio-function' instead of trying to use the function node by importing the module, it shows some errors as in the images below.

Would you know the reasons for the errors? Or what are the best alternatives?


image

image

I'm using node-red installed by CapRover on a Hetzner VPS.

Despite the errors, I've just tested it and the 'node-red-contrib-cheerio-function' is functioning properly. However, I need to constantly confirm the error whenever I deploy.

Therefore, I was considering an alternative that doesn't require a special node but can utilize a standard node like the function and import something there.

Yes. $ is not defined

Just declare it at the top of the function e.g. const $ = cheerio (or whatever $ is an alias for)

OR import it as $

To answer your original question and Steve's query.

Yes, it is certainly possible. And you might want to do so because the node does not cover all of the Cheerio functionality (or at least that's what I remember from when I last did it maybe a year ago).

I need to make my messages clearer.
There are 2 distinct strategies that I've tried:

  1. Cheerio imported as a module in a Node Function: I don't know how to import and use Cheerio in a node function, as it gives the error from my first message in the thread.
  2. Alternative: node-red-contrib-cheerio-function: When trying to use this, I also receive errors, as shown in the other messages above.

@Steve-Mcl, the $ you mentioned was a problem I had when using 'node-red-contrib-cheerio-function' and not importing Cheerio in a basic node function. That's why I expected $ to be available as in the example in the documentation, since I didn't import Cheerio in this case.
doc: This function block extends the functionality of the normal node-red function block with the ease of parsing the msg.payload it it is of type string directly into te $ selector.

@TotallyInformation, thanks. From what I understood, you are talking about strategy 1, which was my initial intention. Correct?

So, how would I need to configure and use it in order to avoid the errors that I mentioned in the initial message of the thread?

Correct, I'm in the fortunate position of knowing JavaScript so I don't need to be dependent on contributed nodes that might not quite do what I want.

This flow works. It is the basic example given on the cheerio GitHub README but wrapped up as a Node-RED flow using a function. Sorry, I seem to have deleted my original example, I think it was something I quickly did for work.

[{"id":"7a86743eb55fad12","type":"inject","z":"30fdd9a9702231b0","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":340,"y":1040,"wires":[["e93737021de3f32a"]]},{"id":"30d311ab0e936042","type":"debug","z":"30fdd9a9702231b0","name":"debug 125","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":730,"y":1040,"wires":[]},{"id":"e93737021de3f32a","type":"function","z":"30fdd9a9702231b0","name":"function 14","func":"const $ = cheerio.load('<h2 class=\"title\">Hello world</h2>');\n\n$('h2.title').text('Hello there!');\n$('h2').addClass('welcome');\n\nmsg.payload = $.html();\n\nreturn msg;","outputs":1,"timeout":0,"noerr":0,"initialize":"","finalize":"","libs":[{"var":"cheerio","module":"cheerio"}],"x":530,"y":1040,"wires":[["30d311ab0e936042"]]}]

cheeriojs/cheerio: The fast, flexible, and elegant library for parsing and manipulating HTML and XML. (github.com)

The readme has a link to a video which I think would show more realistic cheerio examples since you most likely want to load an external page rather than the trivial string example here.

Excepting my slightly odd looking colour scheme for the Editor, this is how you set up the import:

And this is the on-message code:

const $ = cheerio.load('<h2 class="title">Hello world</h2>');

$('h2.title').text('Hello there!');
$('h2').addClass('welcome');

msg.payload = $.html();

return msg;

Thank you for clarifying. Apparently, your example was the approach I tried in the first message of the thread.

I tried exactly what you mentioned again, and I'm still getting the same error that I shared in the first message of the topic:
function : (error) "Error: Cannot find module '/data/node_modules/cheerio/index.js'. Please verify that the package.json has a valid "main" entry"

Please note that the error occurs during the deployment of the flow, even before trying to run it. And I'm using node-red installed by CapRover on a Hetzner VPS.

OK, so at least we now know that it is something local to your device which is progress of a sorts.

Can you get to a command prompt on the device? If it is in Docker, there is a command (that I can never remember off the top of my head) that gets to a command line inside your docker container and that's where you need to end up.

Then if you can get to the /data/ folder, you can check whether cheerio actually got installed and if not, you can do a manual install npm install cheerio - but that must be in the /data/ folder.

docker exec -it container-name sh

Will start a terminal session inside the container.

Thanks.

I just checked inside the docker container.

It seems that cheerio is installed in /data/node_modules:
Screenshot 2023-09-01 05.35.04

And the package.json has the required settings:
Screenshot 2023-09-01 05.37.24

The "main" property specifies to look for index.js inside the lib directory, and I can confirm that the file exists there: /data/node_modules/cheerio/lib:

However, the error message indicates that there is no index.js file inside /data/node_modules/cheerio, which is true because it doesn't exist there.

But I'm not sure where it specifies to read from this directory instead of the one configured in the main property of the package.json

Ah, a thought. Can you please check the following commands inside your data folder?

node --version
npm --version

node --version
v16.20.1

npm --version
8.19.4

Hmm, not that then. Those are fine.

I'm really out of ideas I'm afraid. Everything seems OK to me.

The only thing left that I can see is

Not sure what that is but could it be that it is having an impact? Can't see that it should be, but ...

If you can't get it working directly, there are only a couple of workarounds I can think of. One would be quickly knocking up your own custom node to see if that worked. The other would be to do the request from the browser which you could do via uibuilder.

I appreciate your support and effort to help, @TotallyInformation. Thanks.
I think I'll take a break from this attempt for now.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.