Cheerio with Function and HTML Nodes

Hello,

I need to extract just two values from the page at the RCP web site. A sample sub-string is shown below. For the keyword Approve, I need to retrieve the corresponding value 44.0:

        <div class="value">
            <span style="background: #000000;">44.0</span>
        </div>
        <div class="desc">Approve </div>

I understand the cheerio would be the best utility for this purpose. I've entered the basic "load" instruction in the Function node but don't quite understand the next steps. Is there some interactive helper tool (a la JSONata) to drill down the object model?

I've wired up the HTML node in the flow for parallel evaluation (and self-learning). How could I use cheerio here?

All comments welcome. Tried RTxM for cheerio but obviously didn't get past the Selector control.

Kind regards.

[{"id":"d1e9b9ad.1c8d88","type":"comment","z":"e9169fe6.11ecc8","name":"Data Analysis Exercises","info":"","x":130,"y":80,"wires":[]},{"id":"e5fd9a77.835468","type":"inject","z":"e9169fe6.11ecc8","name":"POTUS Job Approval","topic":"JobApproval","payload":"https://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html","payloadType":"str","repeat":"60","crontab":"","once":false,"onceDelay":0.1,"x":150,"y":140,"wires":[["72d99cfb.0c2804"]]},{"id":"72d99cfb.0c2804","type":"http request","z":"e9169fe6.11ecc8","name":"RCP POTUS Poll","method":"GET","ret":"txt","paytoqs":false,"url":"https://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html","tls":"","proxy":"","authType":"","x":390,"y":140,"wires":[["abaf1307.7c2df8","20650b1a.78eaac"]]},{"id":"1e786658.63578a","type":"debug","z":"e9169fe6.11ecc8","name":"RCP POTUS debug","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","x":880,"y":140,"wires":[]},{"id":"20650b1a.78eaac","type":"html","z":"e9169fe6.11ecc8","name":"RCP POTUS filter","property":"payload","outproperty":"payload","tag":"candidate","ret":"html","as":"single","x":610,"y":220,"wires":[["69ac9252.e90754"]],"info":"https://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html\n\n<tbody>\n<tr>\n    <td class=\"candidate\">\n        <div class=\"value\">\n            <span style=\"background: #000000;\">44.0</span>\n        </div>\n        <div class=\"desc\">Approve </div>\n    </td>\n</tr>\n<tr>\n    <td class=\"candidate\">\n        <div class=\"value\">\n            <span style=\"background: #ff0000;\">53.4</span>\n        </div>\n        <div class=\"desc\">Disapprove \n            <span style=\"color: #ff0000;\">+9.4</span>\n        </div>\n    </td>\n</tr>\n</tbody>"},{"id":"abaf1307.7c2df8","type":"function","z":"e9169fe6.11ecc8","name":"RCP page scrape","func":"const cheerio = global.get('cheerio')\nconst $ = cheerio.load(msg.payload)\nreturn msg;","outputs":1,"noerr":0,"x":610,"y":140,"wires":[["1e786658.63578a"]],"info":"https://www.realclearpolitics.com/epolls/other/president_trump_job_approval-6179.html\n\n<tbody>\n<tr>\n    <td class=\"candidate\">\n        <div class=\"value\">\n            <span style=\"background: #000000;\">44.0</span>\n        </div>\n        <div class=\"desc\">Approve </div>\n    </td>\n</tr>\n<tr>\n    <td class=\"candidate\">\n        <div class=\"value\">\n            <span style=\"background: #ff0000;\">53.4</span>\n        </div>\n        <div class=\"desc\">Disapprove \n            <span style=\"color: #ff0000;\">+9.4</span>\n        </div>\n    </td>\n</tr>\n</tbody>"},{"id":"69ac9252.e90754","type":"debug","z":"e9169fe6.11ecc8","name":" RCP http request","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","x":870,"y":220,"wires":[]}]

Have you looked at the html node? It is a core node and uses cheerio under the skin.

@TotallyInformation, thanks for the heads up.

How does one invoke a cheerio function inside the HTML node? That guidance could help me to drill down.

Using the Function node and trying to explicitly load cheerio failed. I have also installed the contrib-cheerio node (and rebooted) but to no avail. I noticed others mentioning the intrinsic support via cheerio in the HTML node but so far I've been going in the wrong direction.

Kind regards.

I don't think you have any control over cheerio using the html node. It is a wrapper around cheerio but, like much of Node-RED core, is designed to be easy to use rather than totally comprehensive.

The following article on my blog shows you how to reference cheerio manually:

Beautiful Soup version 4?

? ? ? Sorry, I don't understand

Given your advanced understanding of the HTML node and cheerio, I was trying to understand whether I should re-orient myself to another package (e.g. Beautiful Soup as suggested by a colleague a few days ago) or should I just try to trudge along with cheerio. I guess the latter would be better for now since I may be able to bug you again in a few days. (just thinking aloud)

Kind regards.

I've never used it so I can't really comment I'm afraid. Generally speaking, you can use any JavaScript library the way I did in my post. So use whatever suits you best. Choice is part of the beauty of using a node.js based system.

Fair enough. Thx again.

For future readers, Beautiful Soup, whether version 3 or 4, are python libraries for parsing html/xml.
For (partial) html parsing in Node-RED I nowadays put together a solution with the html node to get a selection out using a css selector. Then I put a wrapper div around it to have a single outer element. I use either a function node or a change node with JSONata (concatenation) for it. Next the xml node that uses xml2js inside, followed by a change node with a JSONata parser on the inside to create a specific object containing the information I need.

I’m sick with a fever so I don’t trust myself to work out the parser now, but I would suggest to take a look at the () syntax in JSONata and loop over the element blocks, with a ternary check for the keyword you’re looking for.