HTTP GET - rendering problem

RaiRoo · 2 September 2020 20:01

I try to parse a dynamic web site (monitoring of my Fritz!Box), but I fail with using HTTP GET. Apparently the payload contains data BEFORE the complete rendering of the web site is done. Any ideas?

TotallyInformation · 2 September 2020 21:45

That indicates that the page has a script that loads data after the main page has loaded. You will need to dig into that code and find out what it is doing. Typically, it will be requesting data from an API on the device and you should be able to call that API directly from Node-RED if you can work out what it is doing.

RaiRoo · 3 September 2020 06:23

... ah, good to know. Isn't there a Node-Red way to visit the web site again AFTER rendering?
... and yes, there are tons of scripts within.

Steve-Mcl · 3 September 2020 08:01

best answer == no.

A dynamic web page renders client side in a browser.
Node red is not a browser. Strictly speaking, it does not "open a page", does not render HTML, does not execute client scripts, does not have a DOM to update.

As @TotallyInformation said, you need to figure out where the data comes from.

Often this is a simple matter of opening the "network" tab in the browser devtools & refreshing the page - then check the network transactions to see if the data you need is being pulled in via an API. (this is far simpler to do than explain)

Then you can simply copy the URL and get the data directly into node-red (completely bypassing all the scripts, DOM, rendering and overhead of a browser)

terrible answer == maybe (if you want to do a kludge).

There is (i seem to remember) a puppeteer like node that essentially adds a headless browser to node-red. But trust me when i say this is least best way of approaching your problem for many reasons I wont go into right now. And i say "maybe" as there is no guarantee the scripts will load or work correctly with a headless browser type solution. If I was forced to do this - this would be my very last avenue - even then, I'd likely leave the company instead

TotallyInformation · 3 September 2020 08:51

As Steve say's - not really. The Node-RED way is to pick out the API call(s) and run them directly, getting rid of the cruft that is the web page. Once done, it should work reliably pretty much forever unless the vendor makes a major change to the page - unlikely as they prefer to sell you a new, "improved" router.

Well, there are "good" reasons for doing screen-scraping. Plenty of legacy applications that you can't readily get rid of quickly and that applies for home as well as enterprises. We have 2 systems at work for example that service the whole of the NHS (1.3m people, £150bn pa), not easy to get rid of or even to replace.

But in general, I agree of course. Using Puppeteer in particular is a bit of a monster. In the enterprise, we use Robotic Process Automation tools.

RaiRoo · 3 September 2020 09:42

... thanks a lot, Steve-Mcl and totallyinformation for your messages. Highly appreciated!! I will try the puppeteer way - maybe it will work! I know already that unfortunately there is no API to get the data I need.
Anyway, you saved me a lot of time!!! Thanks again

Steve-Mcl · 3 September 2020 09:50

That is doubtful - that data comes from "somewhere"

Did you look in the network tab of devtools? It might be staring you in the face.

RaiRoo · 3 September 2020 10:16

Well, I have been in contact with the maker of my router (fritzbox). They said, maybe it will be in a future release of their web interface. But I will try what you proposed!

TotallyInformation · 3 September 2020 15:42

Any you never know, if it isn't too big, you could post the html code here and someone might be interested/bored enough to take a look and see if we can spot how it works.

Some people do crosswords, others decipher code

RaiRoo · 9 September 2020 20:20

Yippie!!! I got it!!!
@Steve-Mcl: Often this is a simple matter of opening the "network" tab in the browser devtools & refreshing the page - then check the network transactions to see if the data you need is being pulled in via an API. (this is far simpler to do than explain)
This was really the missing thing!!! Once I checked the network transactions, I had the solution.
Thankyou so much!!!!

Steve-Mcl · 9 September 2020 20:28

You're welcome. As I said, easier to do than explain.

system · 23 September 2020 20:28

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using Node-RED to grab data from Webpage (advice required) General	12	575	15 September 2023
Http-request node does not get all the page data (AKA How to scrape dynamic data from a web page) FAQs	3	624	16 November 2022
Wait for page load before make Get request General	7	3249	26 February 2019
Dynamic HTML scraping General http-request	6	699	11 October 2022
Node red values to html page General	3	178	19 January 2024

HTTP GET - rendering problem

best answer == no.

terrible answer == maybe (if you want to do a kludge).

Related topics