[Solved] screenshot node - wait for webpage to fully load before taking snapshot

Web pages are getting more and more complicated these days and more often than not, take more than a few seconds to fully load.
One example that I wanted for my Node-RED dashboard is a screenshot of the high altitude balloon tracking website.

When you take a screenshot of it, this is what I was seeing:
image

Pretty much just the opening splash screen and not much else.
Not exactly useful.
Digging into the why I found the screenshot node uses Puppeteer (headless Chrome) to take the screen shot.
The problem is that the node does not wait for the website to load, it just pauses a second or so then takes the shot.
While Ok for very simple websites, I have found a lot of websites need different amounts of time to fully load, so just putting a hard number of seconds to delay is not an ideal solution.

Over the past few weeks I dug into Stackoverflow and found some different fragments of sample code to encourage Puppeteer to wait for all scripts, images and so on to load before it takes the screenshot.
I have grafted all them together and updated code into the JavaScript for the node and have tested it many times. Here is the code:

module.exports = function (RED) {
    function ScreenshotNode(config) {
        RED.nodes.createNode(this, config);
        let node = this;
        let path = config.path;
        let puppeteer = require('puppeteer');
        let option = {};

	const waitTillHTMLRendered = async (page, timeout = 120000) => {
	const checkDurationMsecs = 1000;
	const maxChecks = timeout / checkDurationMsecs;
	let lastHTMLSize = 0;
	let checkCounts = 1;
	let countStableSizeIterations = 0;
	const minStableSizeIterations = 3;

	while(checkCounts++ <= maxChecks){
		let html = await page.content();
			let currentHTMLSize = html.length; 

			let bodyHTMLSize = await page.evaluate(() => document.body.innerHTML.length);

//    		console.log('last: ', lastHTMLSize, ' <> curr: ', currentHTMLSize, " body html size: ", bodyHTMLSize);

			if(lastHTMLSize != 0 && currentHTMLSize == lastHTMLSize) 
				countStableSizeIterations++;
			else 
				countStableSizeIterations = 0; //reset the counter

			if(countStableSizeIterations >= minStableSizeIterations) {
//				console.log("Page rendered fully..");
			break;
			}

			lastHTMLSize = currentHTMLSize;
				await page.waitFor(checkDurationMsecs);
		}  
	};


        if (path) {
            option.executablePath = path;
        }

        node.on('input', function (msg) {
            let url;

            if (msg.url) {
                url = msg.url;
            } else if (config.url) {
                url = config.url;
            } else {
                // set to default.
                url = 'http://www.example.com/';
            }

            puppeteer.launch(option).then(async browser => {
                const option = {
                    type: 'png',
                  fullPage: true,
                  encoding: 'base64'
                };
                const page = await browser.newPage();
//                await page.goto(url);
				
				await page.goto(url, {'timeout': 100000, 'waitUntil':'load'});
				await waitTillHTMLRendered(page);
				const data = await page.content();
				
                const base64String = await page.screenshot(option);
                await browser.close();

                msg.payload = base64String;
                node.send(msg);
            });
        });
    }
    RED.nodes.registerType("screenshot", ScreenshotNode);
}

Just so we are clear, I am not a programmer, and you should not use this code, I just display it as an example of what is possible.

Cut and paste that code into a text file, save it as screenshot.js and save it to your computer.
BEFORE you copy it over to your Node-RED, IT IS CRITICAL that you first stop Node-RED from running.
Update the file by overwriting the one that is there now. The location will depend on where/how you have installed Node-RED but look for something like .node-red/node_modules/node-red-contrib-web-page-screenshot directory and copy the screenshot.js file over the top of the one in there. (Be sure and get the web-page-screenshot directory, NOT the contrib-screenshot directory if you have that node installed as well).
Then start Node-RED again the usual way you do.

Use the node in exactly the same way you have been.
The difference now is that the node will wait what ever time the website requires before it takes the screen shot. (Up to 45 seconds).

So now we get an image like this:
image

A much more helpful screenshot.

I don't know how to contact the node author to ask if they would consider taking a look at the issue. The .js file does not get changed on Node-RED updates or restarts, so I have only needed to copy the file over once in a few months.... seems a small price to pay for a big improvement in output.

1 Like

I have been using this modified code for about 7 trouble free months.
Late last week I upgraded Node-RED from 1.3.2 to 1.3.5.
At that time Node-RED started to crash. A lot.
Here is a typical end of log:

6 Jun 14:21:39 - [red] Uncaught Exception:
6 Jun 14:21:39 - TimeoutError: Navigation Timeout Exceeded: 45000ms exceeded
    at C:\Users\tbg\.node-red\node_modules\puppeteer\lib\LifecycleWatcher.js:142:21
  -- ASYNC --
    at Frame.<anonymous> (C:\Users\tbg\.node-red\node_modules\puppeteer\lib\helper.js:111:15)
    at Page.goto (C:\Users\tbg\.node-red\node_modules\puppeteer\lib\Page.js:674:49)
    at Page.<anonymous> (C:\Users\tbg\.node-red\node_modules\puppeteer\lib\helper.js:112:23)
    at C:\Users\tbg\.node-red\node_modules\node-red-contrib-web-page-screenshot\screenshot.js:66:16
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)

The clue for me was the 450000, that was the value I put into my modified screenshot code.
Since I use this node a lot, Node-RED was crashing a lot.
I run my Node-RED on a Windows PC and so I installed an app to monitor and restart Node-RED (RestartOnCrash)
It was restarting my Node-RED a lot. Too much in fact. I have a lot of counters, tables and images that get cleared on a crash/restart and my site was pretty much useless when it crashes/restarts every 15 to 45 minutes.

I don't know what the core issue is, still digging into it, so for now, I have just put the time out in the two places its used in the above code to 1200000 (ie, 2 minutes) and Node-RED has been up for 14 hours, a current record.

EDIT. I upgraded my NodeJS to 16.3 and am now getting some unusual errors when I start Node-RED. I don't think they are related to the changes I made to the screenshot node, but cant be sure because I don't know how to find find node number 49548:

> C:\Users\tbg>node-red
> (node:49548) [DEP0128] DeprecationWarning: Invalid 'main' field in 'C:\Users\tbg\AppData\Roaming\npm\node_modules\node-red\node_modules\@node-red\editor-client\package.json' of './lib/index.js'. Please either fix that or report it to the module author
> (Use `node --trace-deprecation ...` to show where the warning was created)
> (node:49548) [DEP0128] DeprecationWarning: Invalid 'main' field in 'C:\Users\tbg\.node-red\node_modules\node-red-dashboard\package.json' of 'none'. Please either fix that or report it to the module author
> 6 Jun 14:23:27 - (node:49548) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 connection listeners added to [Server]. Use emitter.setMaxListeners() to increase limit
> (node:49548) Warning: Setting the NODE_TLS_REJECT_UNAUTHORIZED environment variable to '0' makes TLS connections and HTTPS requests insecure by disabling certificate verification.

The takeaway from all this is this: I am not a programmer, I have no idea what I am doing. I should not be trying to 'fix' stuff that I have no clue about.

Update:
Putting the code back to original and the new version of Node-RED crashes [red] Uncaught Exception if the snapshot of the image on the website times out. Every time. Very reliably.
@knolleary or @dceejay is there anyway I can ask Node-RED to stop trying to fetch the image from the website before it crashes?

It isn't Node-RED that is trying to get the image - its your code and/or puppeteer.

There must be some error handling missing around the puppeteer code you've got that would handle the timeout. I'm not very familiar with puppeteer, but if I get a chance, I'll take a look at their docs to see what it says about error handling.

1 Like

Thanks Nick, that was really helpful.

I did some Google foo and it seems that there should be something called a 'try - catch' in the screenshot.js code as posted above, currently there is none.... I believe that to be the core problem.... Looking at how to fix it.

The async function (inside the then(...)) should be surrounded with a try catch block but the outer promise (puppeteer.launch(...).then(...) should have a catch handler at the end...

puppeteer.launch(option).then(async browser => {
    try {
        const option = {
            type: 'png',
            fullPage: true,
            encoding: 'base64'
        };
        const page = await browser.newPage();
        //                await page.goto(url);

        await page.goto(url, { 'timeout': 100000, 'waitUntil': 'load' });
        await waitTillHTMLRendered(page);
        const data = await page.content();

        const base64String = await page.screenshot(option);
        await browser.close();

        msg.payload = base64String;
        node.send(msg);
    } catch (err) {
        node.error(err, msg);
    }
})
.catch (err => {
    node.error(err, msg);
});

Thanks very much @knolleary and @Steve-Mcl
Node-RED has run continuously for the past three days. Bit of a record for the new version.
It seems knowing what to Google is half the battle and then knowing how to graft in the other-peoples-code into your code is the last part.

Here is the updated code that seemed to solve the issue of crashing on a timeout.

    function ScreenshotNode(config) {
        RED.nodes.createNode(this, config);
        let node = this;
        let path = config.path;
        let puppeteer = require('puppeteer');
        let option = {};

        const waitTillHTMLRendered = async (page, timeout = 120000) => {
        const checkDurationMsecs = 1000;
        const maxChecks = timeout / checkDurationMsecs;
        let lastHTMLSize = 0;
        let checkCounts = 1;
        let countStableSizeIterations = 0;
        const minStableSizeIterations = 3;

        while(checkCounts++ <= maxChecks){
                let html = await page.content();
                        let currentHTMLSize = html.length;

                        let bodyHTMLSize = await page.evaluate(() => document.body.innerHTML.length);

//              console.log('last: ', lastHTMLSize, ' <> curr: ', currentHTMLSize, " body html size: ", bodyHTMLSize);

                        if(lastHTMLSize != 0 && currentHTMLSize == lastHTMLSize)
                                countStableSizeIterations++;
                        else
                                countStableSizeIterations = 0; //reset the counter

                        if(countStableSizeIterations >= minStableSizeIterations) {
//                              console.log("Page rendered fully..");
                        break;
                        }

                        lastHTMLSize = currentHTMLSize;
                                await page.waitFor(checkDurationMsecs);
                }
        };


        if (path) {
            option.executablePath = path;
        }

        node.on('input', function (msg) {
            let url;

            if (msg.url) {
                url = msg.url;
            } else if (config.url) {
                url = config.url;
            } else {
                // set to default.
                url = 'http://www.example.com/';
            }

puppeteer.launch(option).then(async browser => {
    try {
        const option = {
            type: 'png',
            fullPage: true,
            encoding: 'base64'
        };
        const page = await browser.newPage();
        //                await page.goto(url);

        await page.goto(url, { 'timeout': 100000, 'waitUntil': 'load' });
        await waitTillHTMLRendered(page);
        const data = await page.content();

        const base64String = await page.screenshot(option);
        await browser.close();

        msg.payload = base64String;
        node.send(msg);
    } catch (err) {
        node.error(err, msg);
    }
})
.catch (err => {
    node.error(err, msg);
});
});
    }
    RED.nodes.registerType("screenshot", ScreenshotNode);
}

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.