Not sure you are being fair here. The cheerio library is great at analysing and extracting html elements. The html node uses this, albeit in a somewhat limited way, hence the blog post I pointed to earlier in the thread.
The node-red-contrib-nbrowser node gives access to a full headless browser environment so even dynamic pages can be analysed and data extracted.
There are lots of options in JavaScript for sanitising HTML
Did you ever find a solution? I'm in the same situation. I need to monitor about 300 RSS feeds. currently have them setup with feed parser which is a pain. Whenever I need to add a feed I have to re-deploy...