I have been trying to retrieve data from a weather site to use on my Node-Red dashboard. I have not been able to extract the data I want and wonder if their are certain types of site that it is possible to do this and some where you have no chance with Node-Red?
I can extract (parse?) some material like preambles but not the temperatures and wind data.Tried many permutations of the request and sort functions.
At least if I know what sort of sites cannot be accessed fully then I can try a different approach.
Not every web page actually contains the data you may be seeing -- many pages are just an empty page structure which gets filled in by scripts that use either ajax or websockets to receive the data and render it after the original page is loaded. In many cases, you can use the F12 dev console to see what urls are actually being used to get the raw data, and just call that url instead.
In this particular case, all the readings are actually in html tables on the initial page, so you can scrape the data from them. The best node to try first is the core html node (which also uses the cheerio library) -- the trick is to find the right html and/or css class markers to locate the specific data you want.
Thanks for the help guys. I hope to try your suggestions out by the weekend. It is helpful too that I know the website in mind can have the data extracted from it.
Hi, two things-
I had to try another site as the original one would not let me log on. I suspect that it identified me as trying to scrape data- which I was. So I went to this one which I could access: www.weatherzone.com.au/wa/perth/perth
First off I want to obtain the wind direction from the site so I used the GET method in the http request node and then in the following html node used the selector . This only yielded an empty bracket {} and no change in how I worked it- different permutations of the output parameters- would offer any better. Am I on the right track here but missing something simple? I used hilite as this was the name of the class with the wind direction:
W 22km/h
Secondly I tried to install the cheerio function- but it would not install. Error message:
Error: Command failed: npm WARN engine readable-stream@3.1.1: wanted: {"node":">= 6"} (current: {"node":"0.10.29","npm":"1.4.21"}) npm ERR! Error: ENOENT, open '/home/pi/.node-red/node_modules/node-red-contrib-cheerio-function/node_modules/cheerio/node_modules/lodash/_getValue.js' npm ERR! If you need help, you may report this entire log, npm ERR! including the npm and node versions, at: npm ERR! <http://github.com/npm/npm/issues> npm ERR! System Linux 4.9.28-v7+ npm ERR! command "/usr/bin/nodejs" "/usr/bin/npm" "install" "--production" "node-red-contrib-cheerio-function" npm ERR! cwd /home/pi/.node-red npm ERR! node -v v0.10.29 npm ERR! npm -v 1.4.21 npm ERR! path /home/pi/.node-red/node_modules/node-red-contrib-cheerio-function/node_modules/cheerio/node_modules/lodash/_getValue.js npm ERR! code ENOENT npm ERR! errno 34 npm ERR! Error: Method Not Allowed npm ERR! at errorResponse (/usr/share/npm/lib/cache/add-named.js:260:10) npm ERR! at /usr/share/npm/lib/cache/add-nam ....
I am running version 0.15.3 of node red accessing via Chromium. I also tried using the node-red console with similar result. Any thoughts on this please?
I see the forum has not printed the selector I used properly due to the arrows- this is it without the left right arrows: hilite
The line I thought was the one to focus onwhen I clicked on the wind title and clicked inspect: left arrowtd class="hilite"right arrowW 22km/hleft arrow/tdright arrow
When I've used the HTML node, I try to work down the CSS to get to the bit I want.
So start by trying to capture the outer css that contains the bit you want and then work inwards.
So if I am looking at http://www.weatherzone.com.au/wa/perth/perth and want to access the wind direction I would start with what as a CSS key to search for? Not sure how to identify the right select term among the elements I am presented with when I inspect or hit F12?
But in Chrome, look at the page and then use the View > Developer > devtools
then you can use the "inspect element" icon to select the bit you want
which should open up the correct part of the html.
I think its below the html code but you should see the "nested path" of css classes.
in Node-RED start with the first class. Make sure you can use the html node to grab that bit. and then work our way down the classes
A quick look suggests div.details_lhs td.hilite will get you an array of the values from that table - you can then use a Change node to pick out the right element from that array.
Good catch! Also from the error message you are using a very old version of Nodejs [quote="shoots, post:5, topic:6769"]
"node":"0.10.29","npm":"1.4.21"
[/quote]
Well worth reading the raspberry pi page in the docs and running the upgrade script
Well I'm making some progress. One thing I had wrong was I was using the <> at the beginning and end of the select parameters!! I now have an array containing some data to work with.
Thank you for the help. Will explore further tomorrow.