How can I scrape Wunderground historic data

I'm trying to scrape the day average temperature from here: https://www.wunderground.com/history/daily/us/pa/easton/KPAEASTO22/date/2020-3-28
Each day day I'd like to get the previous day's Day average temperature. I tried http and https node and it doesn't get the whole page. I thought I could visit the page Turn it into json data and get the value. Not. I would also need to insert the date on the end of the url as the previous days date dynamically. Who can help, or how Can I do it?
Thanks in advance.
Dave

On that page, open up your developer tools and go to the network tab and filter on XHR :slight_smile:

You might want to see if you can access that data directly from Node-RED.

Oh, and why you are there, take a look at the number of advertising scripts they load including ones that use the location you provide. Then remember that you are being tracked - by a LOT of organisations. Unless, like me, you have strong ad-blocking. Those tracking scripts have access to data from your browser.

Yes it is filled with with ads which pi-hole and dns takes care of. I will try what you suggested. I'm not much of a programmer just a hobbyist. I was also trying parsehub.com but I'd like it to be in node-red ony.

Glad you are using Pi-hole :slight_smile: Very sensible.

Using Node-RED to grab website data is possibly slight more programming focused than parsehub depending on the source site. However, if those JSON endpoints will work for you, that is much easier than scraping and probably more robust as well.

All you need is to trigger a http-request - say every hour - don't do it too often otherwise your IP address will likely get banned. The forecast almost certainly won't realistically change inside an hour anyway. The data returned is in JSON format which you can then use directly in Node-RED.

Of course, it is possible that you will get banned anyway though I've no sympathy at all for WU as they have treated people quite badly I think.

There are plenty of other weather services and a few of them still have decent and free API's so don't overlook those either.

1 Like

I just need to access once per day to get "day average temperature". How can I get the whole page to load?

Wow, thank you. Those xhr urls passed through a json node got me way closer to my goal! I didn't know we could do that. One of them gave me the temp observation of the whole day. I may have to total them and divide them by the number to get the day average. Today, when I get time I will pass each of the xhr urls and see what each returns. Next hurdle will be taking the current date to get the previous days date and pass that on in the url of the correct xhr url.

2 Likes

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.