Web scraping issue

Hello all node-red maniacs!

I'm web scrapping a little bit and i found a rock in my shoe!.

I have this...

image

And i want to get the value from that table..this is how it sees from the UI.
image

So...in my node-red nodes i have this:

But i only got and empy value, i need to got the same value from what i sees on the UI.

This is how my node is configured to get that element information.

Thanks a lot!!

what is the source of the web page? Can you share a link?

I suspect the value is not present in the DOM on load (but rather populated via javascript)

If you open devtools (F12) open the network tab & refresh the page, you might be able to see where it is coming from

This is how I help the last person with similar request...

The source is local, so i cannot share it here.

This is what i got from devtools.

ok, so first things first. The HTTP Request node grabs what you ask for. In this case you request a WEB page & thats what you get. BUT - the HTTP request node is NOT a browser so it doesnt do Javascript << keep that in mind.

...

Using an example, if you look at the response (like the HTTP Request node would get) you can see this page has no prices in the HTML (they get populated by JS later on)

When I select the HTML (usually the first item in the network list) - we can see there is nothing inside the DIV...

image


However, if i look at the Elements tab, the div actually has content...



So have sift through the network transfers & see if you can spot the value you are interested in. Hopefully it is in the HTML if not, then see if you can find the value somewhere else in the network items.

Ok, i see that the network option shows that the content is empty, so maybe there it comes my issue and i cannot get the value that im looking for (program version).

Could it be that?

Yes. that is why you cannot get it. The value is populated later (somehow) - most likely something in one of those scripts. Have you looked at them? Click them then click "response" tab - see if you can spot the info you are interested in.

Uhmm ok..now i get it.

Inside de scripts there is nothing that brings me the info that i want.
So maybe there is a matter of wait for the information to come up while the UI is loading?

Is there a node that can help me out?

Does this firmware have MQTT? Perhaps you can just have the value sent to your node-red? If you dont know, then search sonoff tasmota MQTT (I assume it does as it stands for T heo- A rends- S onoff- M QTT- OTA)

Or perhaps it has a HTTP API where you can ask for it - search sonoff tasmota REST api

uhm ok, i should look for that link with homeassistant or mqtt so i can get the info.

So you mean parse a Json so i can get that info?.

If you get what you want in JSON then you can just grab the value directly

There’s a great page in the docs that will explain how to use the debug panel to find the right path to any data item.

https://nodered.org/docs/user-guide/messages

Thanks Steve...

Somehow i cannot see that information i need to get the thing i want.
I downloaded the puppetter node but i dont know how to use it either.

Im kinda stuck here...im still looking some information.

Regards!

Hi all!

I returning with another web scrapping issue...
i want to obtain this result stored inside a table but i have no luck.

http://ota.tasmota.com/tasmota/release/

I have this flow.

Thanks a lot in advance.

[{"id":"6e1953b6.ceb55c","type":"html","z":"13e3e683.083059","name":"Check Web Site Tasmota Version","property":"","outproperty":"","tag":"<td>9.1.0</td>","ret":"text","as":"single","x":480,"y":960,"wires":[["3e9f13cb.37838c"]]},{"id":"c087b2bc.6d9e9","type":"http request","z":"13e3e683.083059","name":"Get thehackbox.org/tasmota/release/","method":"GET","ret":"txt","paytoqs":"ignore","url":"http://ota.tasmota.com/tasmota/release/","tls":"","persist":false,"proxy":"","authType":"","x":490,"y":920,"wires":[["6e1953b6.ceb55c"]]},{"id":"2fb8e118.b3cace","type":"comment","z":"13e3e683.083059","name":"Check Tasmota Web Version","info":"","x":160,"y":840,"wires":[]},{"id":"f222f5ec.174c98","type":"delay","z":"13e3e683.083059","name":"","pauseType":"delay","timeout":"2","timeoutUnits":"seconds","rate":"1","nbRateUnits":"1","rateUnits":"second","randomFirst":"1","randomLast":"5","randomUnits":"seconds","drop":false,"x":400,"y":880,"wires":[["c087b2bc.6d9e9"]]},{"id":"3de72ed6.f82a52","type":"inject","z":"13e3e683.083059","name":"Repeat every 15 days","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"1296000","crontab":"","once":false,"onceDelay":"","topic":"","payload":"2","payloadType":"num","x":170,"y":880,"wires":[["f222f5ec.174c98"]]},{"id":"3e9f13cb.37838c","type":"debug","z":"13e3e683.083059","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":750,"y":960,"wires":[]}]

The idea is to store that 9.0.1 or whatever value into a global variable so i can use it later on another nodes.

This should get you most of the way there

[{"id":"3b5babc1.a63754","type":"inject","z":"8bfe9025.7255","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":120,"y":1260,"wires":[["3f9fa0bd.3c963"]]},{"id":"3f9fa0bd.3c963","type":"http request","z":"8bfe9025.7255","name":"","method":"GET","ret":"txt","paytoqs":"ignore","url":"http://ota.tasmota.com/tasmota/release/","tls":"","persist":false,"proxy":"","authType":"","x":270,"y":1260,"wires":[["1e52ea36.0f7356"]]},{"id":"1e52ea36.0f7356","type":"html","z":"8bfe9025.7255","name":"","property":"payload","outproperty":"payload","tag":"tr","ret":"html","as":"multi","x":410,"y":1260,"wires":[["54759709.b0cf28"]]},{"id":"5ec524fe.b237fc","type":"debug","z":"8bfe9025.7255","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":610,"y":1340,"wires":[]},{"id":"54759709.b0cf28","type":"html","z":"8bfe9025.7255","name":"","property":"payload","outproperty":"payload","tag":"td","ret":"html","as":"single","x":170,"y":1340,"wires":[["bb6e79ef.6cb5b8"]]},{"id":"bb6e79ef.6cb5b8","type":"switch","z":"8bfe9025.7255","name":"","property":"payload[0]","propertyType":"msg","rules":[{"t":"cont","v":"tasmota.bin","vt":"str"}],"checkall":"true","repair":false,"outputs":1,"x":310,"y":1340,"wires":[["5ec524fe.b237fc"]]}]

Great approach!!

How can i get only this value?

There’s a great page in the docs that will explain how to use the debug panel to find the right path to any data item.

https://nodered.org/docs/user-guide/messages

Too be clear, read the part about the copy path button. Then you can paste that into a change node to move out copy it wherever you wish. Or paste it into a UI node or however you need to use it

With Tasmota, you can easily request the status via MQTT or web requests. No need to scrap the web interface.

Try sending the status command, you should get back a JSON
https://tasmota.github.io/docs/Commands/#with-web-requests

The command list: Commands - Tasmota

I think this is about checking/grabbing latest versions of code from the web site rather than actually running devices... but i may be wrong.

Both, actually. At least as I understood it. :grinning:

First, the Tasmota device's web interface, hence the suggestion grabbing it via the provided API.

Second, the current version presented by the Github project page. Does Github offer any machine-readable API for that, maybe? :thinking:

This sub flow is how I've been grabbing the latest release of Tasmota in my 'ET Display Home' a flow to help manage your ESPeasy, Sonoff/Tasmota and Homie flashed devices (now on GitHub) flow since I first published the flow back in 2018

[{"id":"fc3e43d5.47c3e","type":"http request","z":"fa6ef639.63d4e8","name":"Get SonOff releases","method":"GET","ret":"txt","paytoqs":false,"url":"","tls":"","proxy":"","authType":"","x":600,"y":220,"wires":[["3a4c8bfe.cdc6ac"]]},{"id":"a404b866.174848","type":"change","z":"fa6ef639.63d4e8","name":"","rules":[{"t":"set","p":"url","pt":"msg","to":"https://github.com/arendst/Sonoff-Tasmota/releases","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":570,"y":180,"wires":[["fc3e43d5.47c3e"]]},{"id":"3aa39bda.588e14","type":"inject","z":"fa6ef639.63d4e8","name":"Check for new Tasmota release","topic":"","payload":"1","payloadType":"str","repeat":"21600","crontab":"","once":true,"onceDelay":0.1,"x":640,"y":100,"wires":[["a404b866.174848"]]},{"id":"3a4c8bfe.cdc6ac","type":"function","z":"fa6ef639.63d4e8","name":"extract latest version","func":"var n = msg.payload.indexOf('<a href=\"/arendst/Tasmota/tree/');\n//node.warn(\"n=\" + n);\nvar str = '<a href=\"/arendst/Tasmota/tree/';\nvar l = str.length;\nn += l;\n//node.warn(\"nl=\" + n);\n//str = msg.payload.substring(n, 13);\n//node.warn(\"str=\" + str);\nmsg.payload = msg.payload.substring(n+1, n+6);\nreturn msg;","outputs":1,"noerr":0,"x":600,"y":260,"wires":[["5b7daebe.09d8f8"]]},{"id":"954daab5.89b158","type":"sqlite","z":"fa6ef639.63d4e8","mydb":"47186c71.905e34","sqlquery":"fixed","sql":"delete from firmware where platform = \"tasmota\";","name":"delete tasmota row","x":590,"y":140,"wires":[["a404b866.174848"]]},{"id":"5b7daebe.09d8f8","type":"change","z":"fa6ef639.63d4e8","name":"store global","rules":[{"t":"set","p":"ETDH.tasmota","pt":"global","to":"payload","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":810,"y":260,"wires":[[]]},{"id":"47186c71.905e34","type":"sqlitedb","z":"","db":"/home/pi/esp8266.db"}]