Node Red HTML Node Can't Read Past FRAME or FRAMESET?

Node Red HTML Node Can't Read Past FRAME or FRAMESET? Noticed the curl has the same issue actually.

I wrote a python script some ago that can parse HTML status from my cable modem. So I can tell when my ISP is doing something to the modem. I wanted to convert this logic to NR, but apparently the HTML node can't see past a FRAME or FRAMESET boundary. Python and PHP can breeze past such, and parse the entire web page at a given URL as a complete response.

While testing, I notice page source view and curl also fail to gather the entire web page content. But the browser debuggger (FireFox in this case) can see past the FRAME or FRAMESET boundtry just fine.

Example of how various tools and HTML node fail to get the entire HTML content.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0033)http://192.168.100.1/startup.html -->
<HTML>
<HEAD>
<META content="text/html; charset=windows-1252" http-equiv=Content-Type>
<META content="Microsoft FrontPage 4.0" name=GENERATOR>
</HEAD>

<FRAMESET border=0 cols=150,* frameBorder=0 frameSpacing=0>
<FRAME scrolling=no src="cmSide.htm">
<FRAME src="indexData.htm">
</FRAMESET>
</HTML>

If PHP and Python can get the entire content, I don't believe this is a case of JS code being called dynamically, since entire page is rendered and sent back as part of the total response. Or could that be otherwise? Maybe I can figure out what the embedded FRAME URL is, and just query that directly.

Otherwise, I guess I will just add MQTT to my python web page scraper and the NR can request as needed. Unless someone has a better or interesting suggestion?

It is a nasty hack, but direct query of the FRAME URL works on some of the web pages, really depends on how the page development was done.

You are right there is no dynamic JS here, but in this case it is the browser making the additional requests to load the frame contents.

The frame contents are separate pages and will need to be loaded as such - with separate requests.

Yes, that is what I had to do, was request the sub-page, frame content.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.