Node Red HTML Node Can't Read Past FRAME or FRAMESET?

Nodi.Rubrum · 8 September 2020 04:28

Node Red HTML Node Can't Read Past FRAME or FRAMESET? Noticed the curl has the same issue actually.

I wrote a python script some ago that can parse HTML status from my cable modem. So I can tell when my ISP is doing something to the modem. I wanted to convert this logic to NR, but apparently the HTML node can't see past a FRAME or FRAMESET boundary. Python and PHP can breeze past such, and parse the entire web page at a given URL as a complete response.

While testing, I notice page source view and curl also fail to gather the entire web page content. But the browser debuggger (FireFox in this case) can see past the FRAME or FRAMESET boundtry just fine.

Example of how various tools and HTML node fail to get the entire HTML content.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0033)http://192.168.100.1/startup.html -->
<HTML>
<HEAD>
<META content="text/html; charset=windows-1252" http-equiv=Content-Type>
<META content="Microsoft FrontPage 4.0" name=GENERATOR>
</HEAD>

<FRAMESET border=0 cols=150,* frameBorder=0 frameSpacing=0>
<FRAME scrolling=no src="cmSide.htm">
<FRAME src="indexData.htm">
</FRAMESET>
</HTML>

If PHP and Python can get the entire content, I don't believe this is a case of JS code being called dynamically, since entire page is rendered and sent back as part of the total response. Or could that be otherwise? Maybe I can figure out what the embedded FRAME URL is, and just query that directly.

Otherwise, I guess I will just add MQTT to my python web page scraper and the NR can request as needed. Unless someone has a better or interesting suggestion?

Nodi.Rubrum · 8 September 2020 04:42

It is a nasty hack, but direct query of the FRAME URL works on some of the web pages, really depends on how the page development was done.

knolleary · 8 September 2020 07:32

You are right there is no dynamic JS here, but in this case it is the browser making the additional requests to load the frame contents.

The frame contents are separate pages and will need to be loaded as such - with separate requests.

Nodi.Rubrum · 8 September 2020 13:37

Yes, that is what I had to do, was request the sub-page, frame content.

system · 7 November 2020 13:37

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Html content internet explorer General	9	438	10 April 2021
Reading HTML in Custom node Developing Nodes	9	833	30 August 2020
Using Node-RED to grab data from Webpage (advice required) General	12	565	15 September 2023
HTTP GET - rendering problem General	11	573	23 September 2020
Website Parsing/ Scraping with HTTP request General	10	1031	4 May 2020

Node Red HTML Node Can't Read Past FRAME or FRAMESET?

Related topics