Obtain value from HTML file

grant1 · 8 January 2020 14:16

Complete noobie question, but here goes...

We have a measurement device (circa 2000) that is connected to our local network and (as I found through trial and error) can display the reading on a locally accessed webpage. I was able to inspect the underlying HTML file and put it through an HTTP request in Node-RED. See screenshots below (taken at slightly different times, hence the different values):

From other forum posts, I believe the solution is to use {{ }} in the URL field, but that's where I am stuck. I have tried several variations but the payload / output looks the same. For example:

I believe I am missing something fairly obvious. Any help is appreciated. All I want to do is obtain the one value (e.g. 2104, 2119, whatever) that is there when I send the request.

TotallyInformation · 8 January 2020 16:48

You can try returning the result as JSON instead of a string, then you can pick out the nearest property and, if needed, tidy it up with a change node.

There is also the html node which gives you more control over extracting just part of a page if you need it.

knolleary · 8 January 2020 16:55

Hi @grant1

In this instance, that is not what you want to look at. You have the URL you need already - as we can see from the Node-RED screenshot you've shared. So you already have a message now with its payload containing the full text of the html page.

Your challenge is to extract the number from the middle of the text. There are a few different ways of doing that.

In this instance, the HTML is not very well structured to help you pull out the number - the number is just sat in the middle of the content. As far as I can see, its the only number in the page. So a fairly hacky approach would be to use a Change node to delete everything that isn't a number from the text.

[{"id":"3d6e5aa.3a0dea6","type":"change","z":"5615726f.3545ac","name":"","rules":[{"t":"change","p":"payload","pt":"msg","from":"[^\\d]+","fromt":"re","to":"","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":340,"y":420,"wires":[["dee0bf3d.884b7"]]}]

Hugely hacky, but may be enough given the structure of the page you've shared.

grant1 · 8 January 2020 17:32

Thank you! Hugely hacky almost did the trick, but it was picking up the '1' (twice) in the H1 tags and therefore was returning a six digit number always starting with 11 (e.g. 112141).

So I just went into the .htm file and removed the line indicated below by the red X. The output via web browser is still almost the same (no more big bold "Current PGA Reading", but who cares). Thanks again.

PGA 1

PGA 2

Topic		Replies	Views
Extracting value from HTML \ Javascript General http-request	3	470	16 November 2022
HTML node parser ; how to use the selector function to lookup a value? General	12	4859	10 December 2020
Question about extraction of text on HTML webpage General	7	285	21 October 2022
Extract HTML tag from website General http-request , function-node	8	274	5 January 2023
Extract different values from a multiline string General	24	4883	2 October 2018

Obtain value from HTML file

Related topics