Html node not extracting content

With this simplistic flow, should the html node be able to get the contents of the .name element ?

[{"id":"c5073e7167dfd3b0","type":"inject","z":"07c0e534b17ff9a0","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"<td data-th=\"Fonds\" class=\"name\">Obligaties Wereldwijd</td>","payloadType":"str","x":482,"y":144,"wires":[["6ee0d520cb8a2368","a91557144a38bd23"]]},{"id":"a91557144a38bd23","type":"debug","z":"07c0e534b17ff9a0","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":790,"y":144,"wires":[]},{"id":"6ee0d520cb8a2368","type":"html","z":"07c0e534b17ff9a0","name":"","property":"payload","outproperty":"payload","tag":".name","ret":"text","as":"single","x":628,"y":96,"wires":[["a91557144a38bd23"]]}]

Try td.name or *.name for any html tag

I was wondering why this would work, it didn't :wink:

works for me using *.name

I am flabbergasted, what is going on with my node-red instance, i have more unexplainable issues.

Which version are you running (inc node/npm) ?

If i remember right i think node red is 1.29 and nodejs 12, not sure of npm thnk it is 6.**

Thanks I can replicate, just launched 2 lxc's with the different versions

1.3.7 working

2.0.6 not working

@knolleary what could be the culprit ?

There was a major version change to the HTML parsing library we use in 2.0 - and it took a lot of work to get back to the same functionality their major version change had modified.

It is entirely possible you are hitting a case where behaviour has changed. Please raise an issue so it can be investigated.

Thanks, I opened https://github.com/node-red/node-red/issues/3137

Interesting note, when I change the payload from:

<td data-th="Fonds" class="name">Obligaties Wereldwijd</td>

to

<table><td data-th="Fonds" class="name">Obligaties Wereldwijd</td></table>

it works.

I think this is one of those edge cases we're going to have to say "sorry, yes the behaviour has changed" - which is what a major version change does allow us to do. The beta release notes did highlight this node as being one that had had a significant update under the covers.

The underlying library made significant changes to how it handles full HTML documents, versus fragments. I cannot immediately see a way to get it working is it did before without breaking another scenario.

That is ok, it just took some time to understand what was happening.
I replicated it with the underlaying cheerio library, I can raise an issue there - if it is an actual issue, if it expects a full valid DOM element, there might be no way of working around it other than encapsulating it inside a table or something, although I then would expect it to validate it for a <tr> as well. Actually with only a <tr> around it, it doesnt work either. Weird indeed.

thanks - let us know what they say .

It is like inception, because cheerio uses a parser library, which uses another library lol

I noticed a comment:

parse5 seems to remove all table related tags since table tag itself is missing, if you add table all tags will be present. It is actually also how browser handles those tags.

They suggested to use the xml:true parameter, not an option for node-red i think. But instead I will handle it via an xml node instead.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.