Extract different values from a multiline string

#1

I have a device that shows a webpage with some info. (see picture)Capture
I found two ways to get info from that webpage.
1:methode1
and
2:methode2
Because I am just new to this and learning, (much to learn, I know) I am not sure what would be the best way to use (maybe there is even a better way), and it gives me this in the debug messages:
methode2-debug-messages
Because I am only interested in the
value 11.95 of the Temperature in Celsius,
value 69.50 of the Humidity, and the
value 0 of Lx I thought to use a split node but that gives me a stream (21!) messagesmethode3-debug-messages .
After ready different pages like: Writing Functions and Working with messages from the Node-RED documentation site I still don't know how to get only the values I am interested in.
Maybe an example or some pointers could help me out here.
Thanks.

0 Likes

#2

One small step further!
Trying it again with a function node like this


Almost.

0 Likes

#3

Indeed, when using Node-RED we usually have many different ways to achieve what we want. For this kind of use case I like to use the html node as it allows one to retrieve the data from the webpage very easily.

Are you the owner of the webpage ? If yes can you possibly modify it to add a class attribute to the relevant data to be extracted later on ?

See below a flow that illustrates the use of the html node.

The flow will extract the version from https://nodered.org/ , load it to msg.payload and have it displayed in your debug panel.

[{"id":"e7fe187.3e0a6e8","type":"tab","label":"Data formats","disabled":false,"info":"# **Extracting data from an HTML page**\n\n## **Problem**\nYou want to connect to a web page and retrieve some data.\n\n## **Solution**\nUse the html node found in the function session of the node pallete.\n\n## **Example**\n\n\n## **Discussion**\nConfigure the html node with the CSS selection, in this use case it was used the class selector .node-red-latest-version\n"},{"id":"258a6e83.f928f2","type":"inject","z":"e7fe187.3e0a6e8","name":"make request","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":"","x":150,"y":120,"wires":[["d70cd407.a84468"]]},{"id":"d70cd407.a84468","type":"http request","z":"e7fe187.3e0a6e8","name":"https://nodered.org","method":"GET","ret":"txt","url":"https://nodered.org","tls":"","x":330,"y":120,"wires":[["2310c6b7.cdbe3a"]]},{"id":"c48f3a35.2c2958","type":"debug","z":"e7fe187.3e0a6e8","name":"","active":true,"tosidebar":true,"console":false,"complete":"false","x":750,"y":120,"wires":[]},{"id":"2310c6b7.cdbe3a","type":"html","z":"e7fe187.3e0a6e8","name":"Select latest version","property":"payload","outproperty":"payload","tag":".node-red-latest-version","ret":"text","as":"multi","x":560,"y":120,"wires":[["c48f3a35.2c2958"]]},{"id":"a275e53.18ac018","type":"comment","z":"e7fe187.3e0a6e8","name":"Extracting data from an HTML page","info":"","x":200,"y":60,"wires":[]}]
1 Like

#4

There is an HTML parsing library called "cheerio" that is designed to help find and read information from web pages... and although I've not used it, you may want to see if it's easier to install and configure a node-red-contrib-cheerio node to do this for you.

1 Like

#5

Looks like you are only a very small step of a possible solution.

Your function code results in an array of strings that always have a sequence of digits in the beginning. This suggests that you could use the javaScript function "parseFloat". Probably it will work if you modify the line as follows. I have not tested, though

from: result[parts[0]] = parts[1];

to result[parts[0]] = parseFloat(parts[1]);

1 Like

#6

The core html node also uses Cheerio and automatically parses the data. It works for most simple html and should work here.

I also did a flow quite a while back that calls Cheerio manually - it was before the cheerio node existed. There is a copy on my blog.

1 Like

#7

After a good night sleep I found out that this is not good. Didn't see it yesterday night because of ... be tired and looking to long to it maybe. It seems that I don't get the lux info and that is logic because in the function node I ask for the : but there are non in the lux line.

0 Likes

#8

Thanks Andrei for your information. I am indeed the owner of the very simple webpage. And after fiddling with this I also thought about changing that page. It's just that I didn't know if it was possible in the beginning. This device is standing here for years now and because I am looking, learning and trying to find a kind of home automation I thought about trying it just as it is. I see now the possibilities of Node-RED and it's possible power but still need to learn a whole lot more. Thank you for your time and kind words.

0 Likes

#9

Thanks shrickus.
As we speak I am trying to install it.

0 Likes

#10

Thanks TotallyInformation.
That's a nice blog you have there. So sorry that I didn't find it with Google. I am trying to install that library called "cheerio" as we speak. Thanks again for your information.

0 Likes

#11

The built in HTML node already uses cheerio so there should be no need to install it separately.
Maybe if you paste the output from your debug node as a string here (rather than an image) then people can have a go at trying some of these things for you.

2 Likes

#12

As Dave correctly points out, it would be better to use the built-in html node... sorry for adding to the confusion!

1 Like

#13

Thanks dceejay for your responce.

The string I have without the function node looks like this:

<!DOCTYPE HTML><html><head></head><body><h1>GH 1</h1><h3>Temperature in Celsius:  53.75*C</h3><h3>Temperature in Fahrenheit: 128.75*F</h3><h3>Humidity:  70.60%</h3><h3>29245  Lx</h3><h3>Port A: 192Intergrated potentiometer</h3><h3>Port B: 192Intergrated photo resistor</h3><h3>Port C: 10Intergrated thermistor</h3><h3>Port D: 192Not connected</h3><h3></body></html>

This is the function node that tryes to split it but than if have no lx info.


Like this:

{">Temperature in Celsius":" \r\n 56.63\r\nC</",">Temperature in Fahrenheit":" \r\n133.93\r\nF</",">Humidity":" \r\n 67.80\r\n%</",">Port A":" \r\n192\r\nIntergrated potentiometer</",">Port B":" \r\n192\r\nIntergrated photo resistor</",">Port C":" \r\n10\r\nIntergrated thermistor</",">Port D":" \r\n192\r\nNot connected</"}

I wonder if it is even possible because of the design of the webpage. Like Andrei said, it is possible to change the page (Arduino code) to make it easy-er to handle I guess in Node-RED.

0 Likes

#14

No problem shrickus. I guess it like that way to Rome. Thanks anyway.

0 Likes

#15

A simple start would be


where the html node is set to just h3 to pull out all the h3 tags
From that you will see that - sadly - your Lx value is an odd one out in that it doesn't read unit:value
so further parsing will be a pain... rather than dead simple :slight_smile: As you own the web page - if it could be Lx: 29495 etc then that would make life much easier.

1 Like

#16

Try this flow using a function instead of the (@dceejay) split node:

[{"id":"3e16b40c.4ae6ac","type":"inject","z":"cf6706ec.415a58","name":"","topic":"","payload":"<!DOCTYPE HTML><html><head></head><body><h1>GH 1</h1><h3>Temperature in Celsius:  53.75*C</h3><h3>Temperature in Fahrenheit: 128.75*F</h3><h3>Humidity:  70.60%</h3><h3>29245  Lx</h3><h3>Port A: 192Intergrated potentiometer</h3><h3>Port B: 192Intergrated photo resistor</h3><h3>Port C: 10Intergrated thermistor</h3><h3>Port D: 192Not connected</h3><h3></body></html>","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":130,"y":2460,"wires":[["f557dd49.10d0e"]]},{"id":"5f900aa0.9a8754","type":"debug","z":"cf6706ec.415a58","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":550,"y":2460,"wires":[]},{"id":"f557dd49.10d0e","type":"html","z":"cf6706ec.415a58","name":"","property":"payload","outproperty":"payload","tag":"h3","ret":"html","as":"single","x":250,"y":2460,"wires":[["e12d12dc.06669"]]},{"id":"e12d12dc.06669","type":"function","z":"cf6706ec.415a58","name":"get values","func":"\nmsg.payload = {\n    \"temperature\": parseFloat(msg.payload[0].split(\":\")[1]),\n    \"humidity\": parseFloat(msg.payload[2].split(\":\")[1]),\n    \"lux\": parseFloat(msg.payload[3])\n};\n\nreturn msg;","outputs":1,"noerr":0,"x":390,"y":2460,"wires":[["5f900aa0.9a8754"]]}]
2 Likes

#17

Wow.
Thanks cflurin.

{"temperature":53.75,"humidity":70.6,"lux":29245}

Looks like abracadabra to me for now but I am going to studie on this because it just works.
There is only one small step to go I think because I am only interested in the value but that looks to be easy I guess. Thanks for the great example.

0 Likes

#18

You can use a function with 3 outputs:

[{"id":"3e16b40c.4ae6ac","type":"inject","z":"cf6706ec.415a58","name":"","topic":"","payload":"<!DOCTYPE HTML><html><head></head><body><h1>GH 1</h1><h3>Temperature in Celsius:  53.75*C</h3><h3>Temperature in Fahrenheit: 128.75*F</h3><h3>Humidity:  70.60%</h3><h3>29245  Lx</h3><h3>Port A: 192Intergrated potentiometer</h3><h3>Port B: 192Intergrated photo resistor</h3><h3>Port C: 10Intergrated thermistor</h3><h3>Port D: 192Not connected</h3><h3></body></html>","payloadType":"str","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":130,"y":2460,"wires":[["f557dd49.10d0e"]]},{"id":"5f900aa0.9a8754","type":"debug","z":"cf6706ec.415a58","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":590,"y":2420,"wires":[]},{"id":"f557dd49.10d0e","type":"html","z":"cf6706ec.415a58","name":"","property":"payload","outproperty":"payload","tag":"h3","ret":"html","as":"single","x":250,"y":2460,"wires":[["e12d12dc.06669"]]},{"id":"e12d12dc.06669","type":"function","z":"cf6706ec.415a58","name":"get values","func":"var msg1 = {};\nvar msg2 = {};\nvar msg3 = {};\n\nmsg1.payload = parseFloat(msg.payload[0].split(\":\")[1]);\nmsg2.payload = parseFloat(msg.payload[2].split(\":\")[1]);\nmsg3.payload = parseFloat(msg.payload[3]);\n\nreturn [msg1,msg2,msg3];","outputs":3,"noerr":0,"x":390,"y":2460,"wires":[["5f900aa0.9a8754"],["907e1f7.c9796e"],["a2d09eaf.be897"]]},{"id":"a2d09eaf.be897","type":"debug","z":"cf6706ec.415a58","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":590,"y":2500,"wires":[]},{"id":"907e1f7.c9796e","type":"debug","z":"cf6706ec.415a58","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":590,"y":2460,"wires":[]}]
1 Like

#19

This is great and the solution for me.
Thanks cflurin.
And thanks for everybody who helped me. Really appreciate every tip, example or suggestion because I learn a lot this way.

1 Like

#20

It's times like these when I wish the flows library had at least each core node's info (+ examples) available for searching, and for sharing a link to its docs...

0 Likes