HTTP GET - rendering problem

I try to parse a dynamic web site (monitoring of my Fritz!Box), but I fail with using HTTP GET. Apparently the payload contains data BEFORE the complete rendering of the web site is done. Any ideas?

That indicates that the page has a script that loads data after the main page has loaded. You will need to dig into that code and find out what it is doing. Typically, it will be requesting data from an API on the device and you should be able to call that API directly from Node-RED if you can work out what it is doing.

... ah, good to know. Isn't there a Node-Red way to visit the web site again AFTER rendering?
... and yes, there are tons of scripts within.

best answer == no.

A dynamic web page renders client side in a browser.
Node red is not a browser. Strictly speaking, it does not "open a page", does not render HTML, does not execute client scripts, does not have a DOM to update.

As @TotallyInformation said, you need to figure out where the data comes from.

Often this is a simple matter of opening the "network" tab in the browser devtools & refreshing the page - then check the network transactions to see if the data you need is being pulled in via an API. (this is far simpler to do than explain)

Then you can simply copy the URL and get the data directly into node-red (completely bypassing all the scripts, DOM, rendering and overhead of a browser)

terrible answer == maybe (if you want to do a kludge).

There is (i seem to remember) a puppeteer like node that essentially adds a headless browser to node-red. But trust me when i say this is least best way of approaching your problem for many reasons I wont go into right now. And i say "maybe" as there is no guarantee the scripts will load or work correctly with a headless browser type solution. If I was forced to do this - this would be my very last avenue - even then, I'd likely leave the company instead :wink:

As Steve say's - not really. The Node-RED way is to pick out the API call(s) and run them directly, getting rid of the cruft that is the web page. :mage: Once done, it should work reliably pretty much forever unless the vendor makes a major change to the page - unlikely as they prefer to sell you a new, "improved" router.

Well, there are "good" reasons for doing screen-scraping. Plenty of legacy applications that you can't readily get rid of quickly and that applies for home as well as enterprises. We have 2 systems at work for example that service the whole of the NHS (1.3m people, Ā£150bn pa), not easy to get rid of or even to replace. :slight_smile:

But in general, I agree of course. Using Puppeteer in particular is a bit of a monster. In the enterprise, we use Robotic Process Automation tools.

... thanks a lot, Steve-Mcl and totallyinformation for your messages. Highly appreciated!! I will try the puppeteer way - maybe it will work! I know already that unfortunately there is no API to get the data I need.
Anyway, you saved me a lot of time!!! Thanks again

That is doubtful - that data comes from "somewhere"

Did you look in the network tab of devtools? It might be staring you in the face.

Well, I have been in contact with the maker of my router (fritzbox). They said, maybe it will be in a future release of their web interface. But I will try what you proposed!

Any you never know, if it isn't too big, you could post the html code here and someone might be interested/bored enough to take a look and see if we can spot how it works.

Some people do crosswords, others decipher code :mage:

Yippie!!! I got it!!!
@Steve-Mcl: Often this is a simple matter of opening the "network" tab in the browser devtools & refreshing the page - then check the network transactions to see if the data you need is being pulled in via an API. (this is far simpler to do than explain)
This was really the missing thing!!! Once I checked the network transactions, I had the solution.
Thankyou so much!!!!

2 Likes

You're welcome. As I said, easier to do than explain.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.