HTML parse node issue

Hi Guys,

I've discovered a behavior difference of HTML parsing node according to Node-Red version (and maybe node.js version ??).

The following flow is just a "foo" string in msg.palyoad that shall be parsed in msg.payload output a array.

This works well on Node-red v1.2.6 (node.js v12.18.0), but not on v2.1.3 (Node.js v17.1.0).

I've already seen that before, but did put attention on this since a implemented bypass. THis means this issue not only related to that newest version...

I'm sure I'm not the only one facing this issue, but I cannot find any info about it on the web.

Can somebody help to fix this?

[{"id":"f15e247e303f869c","type":"inject","z":"8499110d.f4114","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":330,"y":1400,"wires":[["9fbc3115654d8932"]]},{"id":"9fbc3115654d8932","type":"function","z":"8499110d.f4114","name":"","func":"msg.payload = \"<td>foo</td>\";\n\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","libs":[],"x":570,"y":1420,"wires":[["3daf3e4160f3b523"]]},{"id":"3daf3e4160f3b523","type":"html","z":"8499110d.f4114","name":"Get row data","property":"payload","outproperty":"payload","tag":"td","ret":"html","as":"single","x":770,"y":1420,"wires":[["38a638931bab31ce"]]},{"id":"38a638931bab31ce","type":"debug","z":"8499110d.f4114","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","x":950,"y":1420,"wires":[]}]

Hello,

This was reported a while back from @bakman2 in this thread

The issue is with the html parser Cheerio.js library the Html node uses.

its due to your string not being a full html document but a fragment.
(github page where Bakman opened the issue)


@knolleary
I played around with the Html node code where cheerio is used
var $ = cheerio.load(value, null, false);
seems to fix the issue but i didnt do much testing (just a cookbook example and Esalles flow)

Yes, that fixes this issue, but at the expense of breaking the majority use case where you do pass in a valid document.

The change to the underlying module of the HTML node was highlighted in the 2.0 release notes as a potential area of breaking change - although at the time we weren't 100% sure of what edge cases the change would manifest.

We have an item on the backlog to add an option to the HTML node to put it into fragment mode.

yea thats always a big problem with code changes :face_with_monocle:

in the case of the cookbook example where an http request is done on the the NodeRed home page and select the .node-red-latest-version, even if we pass the whole document as a fragment it still picks up the correct element .. i didnt notice any side effects .. but you are right, an option would be the safest.

I believe the “fragment” part they were referring to was more related to a td being part of a table/thead fragment (in the html spec)

<tr>
    <td>100</td>
    <td>200</td>
    <td>300</td>
</tr>

so cheerio because the above is not according to html specs it doesnt consider it as valid (table) and doesnt parse it ?

Indeed, if you put table tags around it, it works.