Can't get link tag href data from Cheerio parser

Anyone out there know how to work this Cheerio node as the descriptions don't cover how to
use the bottom portion in detail.

I'm trying to extract the URL link from this 'Cheerio Chain' and I came to the link tag and can't get anything from it.

Maybe one of you are familiar with this.... The link tag does not have any embedded content between it and a closing tag, so it is like <link href=" ... " /> and doesn't return any values for me.

Here is my flow example I'm trying to use.

[{"id":"bec55481.5739c8","type":"debug","z":"d04ea5a4.8bc1e8","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","x":3270,"y":3560,"wires":[]},{"id":"f02b754d.013a98","type":"inject","z":"d04ea5a4.8bc1e8","name":"","topic":"","payload":"","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":2970,"y":3440,"wires":[["ed4e08a0.e764a8"]]},{"id":"cb06ff7e.7ff54","type":"cheerio","z":"d04ea5a4.8bc1e8","name":"","tag":"link","ret":"html","as":"multi","map":[{"search":"link","ret":"text","replace":"","attr":"href"}],"xmlMode":true,"decodeEntities":true,"lowerCaseTags":true,"lowerCaseAttributeNames":true,"recognizeCDATA":true,"recognizeSelfClosing":false,"x":3090,"y":3500,"wires":[["bec55481.5739c8"]]},{"id":"ed4e08a0.e764a8","type":"function","z":"d04ea5a4.8bc1e8","name":"","func":"msg.payload = `\n<id>tag:google.com,2013:googlealerts/feed:18237228527151599722</id> <title type=\"html\">&lt;b&gt;Delaware&lt;/b&gt; North plans temporary leave, pay cuts for thousands of workers</title> <link href=\"https://www.google.com/url?rct=j&amp;sa=t&amp;url=https://buffalonews.com/2020/03/25/delaware-north-plans-temporary-leave-pay-cuts-for-thousands-of-workers/&amp;ct=ga&amp;cd=CAIyGjJhZWJhMmJhNzAxZTJlOGI6Y29tOmVuOlVT&amp;usg=AFQjCNHFwYRi_qKCKf-mWjPONfrJMjJ1cQ\"/> <published>2020-03-25T15:02:03Z</published> <updated>2020-03-25T15:02:03Z</updated> <content type=\"html\">&lt;b&gt;Delaware&lt;/b&gt; North is placing most of its full-time workers on leave as the Buffalo-based hospitality giant grapples with the deep blow to its business&amp;nbsp;...</content> <author> <name/> </author>\n\n`\nreturn msg;","outputs":1,"noerr":0,"x":2970,"y":3480,"wires":[["cb06ff7e.7ff54"]]}]

Thanks for any help in advance.

The cheerio node works exactly the same as a function node. The only difference is that the cheerio library is exposed to the node so that you can use it directly.

I think that the example I wrote in this reply contains the cheerio node.

https://discourse.nodered.org/t/re-flow-corona-virus-map/23098/4

I thought there was some similarities :slight_smile: I ended up writing a function to parse the payload.

Thanks for the help @TotallyInformation!

"Oops Page does not exist" or is private.

Odd, I just copied it from my URL bar.

Try this instead:

https://discourse.nodered.org/t/re-flow-corona-virus-map/23098/4?u=totallyinformation

Hmm, also odd because that is virtually the same link but I used the "share a link to this post" button.

It doesn't work because you a linking to a private message exchange between you and the other participates in that conversation.

1 Like

Oops! :blush:

Thanks Nick. I've lost track of my posts.

OK, here is the flow then:

This will get all of the confirmed cases data into an output msg. Don't forget to also pick up the deaths and recoveries data if you want to track current infections rather than just totals.

[{"id":"731e3a80.f158b4","type":"inject","z":"769ea95b.9e7518","name":"","topic":"","payload":"","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":135,"y":100,"wires":[["c14526cf.21f0c8"]],"l":false},{"id":"c14526cf.21f0c8","type":"http request","z":"769ea95b.9e7518","name":"GET Covid-19 Confirmed Cases (WHO/Johns Hopkins)","method":"GET","ret":"txt","paytoqs":false,"url":"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv","tls":"","persist":false,"proxy":"","authType":"","x":390,"y":100,"wires":[["122de92d.d482e7"]]},{"id":"122de92d.d482e7","type":"csv","z":"769ea95b.9e7518","name":"","sep":",","hdrin":true,"hdrout":true,"multi":"mult","ret":"\\n","temp":"","skip":"0","strings":true,"x":635,"y":100,"wires":[["9286a886.653c58"]],"l":false},{"id":"3e8a1f4f.c1358","type":"debug","z":"769ea95b.9e7518","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":1030,"y":100,"wires":[]},{"id":"9286a886.653c58","type":"function","z":"769ea95b.9e7518","name":"","func":"const out = {}\n\nconst cdata = flow.get('populations.rows', 'file')\nconst nocmatch = []\n\nmsg.payload.forEach( (country) => {\n    const cname = country['Country/Region']\n\n    // We don't need these\n    delete country['Country/Region']\n    delete country['Province/State']\n    delete country.Lat\n    delete country.Long\n\n    const keys = Object.keys(country)\n    const values = Object.values(country)\n    \n    const dates = []\n    \n    keys.forEach( (key, i) => {\n        let k = key.split('/')\n        let newKey = `20${k[2]}-${k[0].padStart(2, '0')}-${k[1].padStart(2, '0')}`\n\n        dates.push({\n            'date': newKey,\n            'cases': values[i]\n        })\n    })\n    \n    var thiscdata = cdata.filter( c => c['Country (or dependency)'] === cname )\n\n    out[cname] = {\n        'country': cname,\n        //'population': thiscdata[0],\n        'confirmed': dates\n    }\n    \n    // Merge in population data if available\n    if ( thiscdata.length === 0 ) nocmatch[cname] = cname\n    else {\n        Object.keys(thiscdata[0]).forEach( (key, i) => {\n            // Ignore # & country\n            if ( i < 2 ) return\n            \n            out[cname][key] = thiscdata[0][key]\n        })\n    }\n})\n\nreturn [ \n    {topic: msg.topic, payload: out}, \n    {topic:'Countries With No Matching Population Entry', payload: Object.values(nocmatch)},\n]","outputs":2,"noerr":0,"x":730,"y":100,"wires":[["3e8a1f4f.c1358"],["4f8e8d.2ea27174"]]},{"id":"4f8e8d.2ea27174","type":"debug","z":"769ea95b.9e7518","name":"","active":false,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":1030,"y":140,"wires":[]}]

Just a note to say that the source files have changed so it won't actually run but it is the cherio bit that you are interested in.

Hey Julian,

Thanks for the reply,

As I checked the flow, I don't see any Cheerio node in your example - if you were trying to point that out. Please forgive me if I am looking in the wrong spots or misunderstood in your example.

Regardless it's an idea I had as well tracking such a disease, though busy with this other project :frowning:

My function is working OK pertaining to a Google Alerts RSS feed. I'd be happy to share, though it's sloppy only in context to Google Alerts RSS, and not really a universal 'one-size-fits-all' extractor.

I'm sure it can be improved drastically to handle many cases, or there exists a better solution probably in Cheerio?

Here is my sloppy function that works for now:
Give it the http request response of a google rss url: into a msg.payload

msg.payload =  msg.payload.substring( msg.payload.indexOf('<link href="'), msg.payload.indexOf('"/>'));
msg.payload = msg.payload.substring(msg.payload.indexOf('url=') +  4, msg.payload.length );
msg.payload = msg.payload.substring(0, msg.payload.indexOf('/&amp'));

This works for now getting that URL from a Google Alert RSS feed... But only if there is one.

I understand there is some feed parser node that supposed to extract the info I believe? (can't remember which without diggin into a search), but I recall it wasn't getting the URL as I needed.
Nor was Cheerio. So custom function it is for now.

Oh poo, sorry it's been a long, trying week. I think that ended up as the wrong post, I should have checked.

OK, 3rd time's the charm. This time, I actually double-checked. It was because they were both related to COVID data.

No worries! I've had a similar interesting week, to put it in a super nice optimistically cozy sarcastic way. One step at a time. One solution at a time, despite sometimes taking 3 steps back involuntarily.

Thanks for the diligence at least!

I see now what you are talking about.. I recall seeing this node when I installed Cheerio, but didn't really register it in my head to look into it...

By your useful example, I see how you can use JQuery selectors and write nifty code using such... Wow, thanks Julian for pointing this out with this example.

Will be looking into this more.... Thanks a bunch!

No problem, I did note that the documentation for that node is rather lacking. Fortunately, I'd once done things the "hard" way by loading the cheerio library into a global variable in settings.js. I have a flow in the flows library and an article on my blog. But the node is certainly convenient.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.