Obtain value from HTML file

Complete noobie question, but here goes...

We have a measurement device (circa 2000) that is connected to our local network and (as I found through trial and error) can display the reading on a locally accessed webpage. I was able to inspect the underlying HTML file and put it through an HTTP request in Node-RED. See screenshots below (taken at slightly different times, hence the different values):

From other forum posts, I believe the solution is to use {{ }} in the URL field, but that's where I am stuck. I have tried several variations but the payload / output looks the same. For example:

I believe I am missing something fairly obvious. Any help is appreciated. All I want to do is obtain the one value (e.g. 2104, 2119, whatever) that is there when I send the request.

You can try returning the result as JSON instead of a string, then you can pick out the nearest property and, if needed, tidy it up with a change node.

image

There is also the html node which gives you more control over extracting just part of a page if you need it.

Hi @grant1

In this instance, that is not what you want to look at. You have the URL you need already - as we can see from the Node-RED screenshot you've shared. So you already have a message now with its payload containing the full text of the html page.

Your challenge is to extract the number from the middle of the text. There are a few different ways of doing that.

In this instance, the HTML is not very well structured to help you pull out the number - the number is just sat in the middle of the content. As far as I can see, its the only number in the page. So a fairly hacky approach would be to use a Change node to delete everything that isn't a number from the text.

[{"id":"3d6e5aa.3a0dea6","type":"change","z":"5615726f.3545ac","name":"","rules":[{"t":"change","p":"payload","pt":"msg","from":"[^\\d]+","fromt":"re","to":"","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":340,"y":420,"wires":[["dee0bf3d.884b7"]]}]

Hugely hacky, but may be enough given the structure of the page you've shared.

Thank you! Hugely hacky almost did the trick, but it was picking up the '1' (twice) in the H1 tags and therefore was returning a six digit number always starting with 11 (e.g. 112141).

So I just went into the .htm file and removed the line indicated below by the red X. The output via web browser is still almost the same (no more big bold "Current PGA Reading", but who cares). Thanks again.

PGA 1

PGA 2