Need some help scraping a website

mitchell · 19 September 2020 15:41

I managed to scrap a simple website as test and got the correct data back.

But I am trying scrap the “out of stock” notice on here:
https://www.nvidia.com/en-gb/shop/geforce/?page=1&limit=9&locale=en-gb

So I can make an automation for when it changes.

I can’t seem to find the correct CSS selector I’ve tried for a few hours and it always comes back “empty” in node red

ristomatti · 19 September 2020 17:58

I would assume when it says "out of stock" the product won't come back in stock anymore as there's also products with a "notify me" button. But the "out of stock" seems quite simple as its just text. Grab the closest surrounding HTML element which has a class or id defined and then search for the text from its innerHTML.

This is just guesswork though as you did not mention what node/tool/library are you using for the scraping.

mitchell · 20 September 2020 08:32

Sorry here is the flow i am using http request, html node with .buy-col-xl div.buy as the sector and outputting to debug node, it comes back up "emptyT

[{"id":"4dc4a1b1.5df1e","type":"tab","label":"Flow 3","disabled":false,"info":""},{"id":"409600d1.4c79f","type":"inject","z":"4dc4a1b1.5df1e","name":"make request","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"x":110,"y":40,"wires":[["5feb553a.b2b6ac"]]},{"id":"5feb553a.b2b6ac","type":"http request","z":"4dc4a1b1.5df1e","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"https://www.nvidia.com/en-gb/shop/geforce/?page=1&limit=9&locale=en-gb","tls":"","persist":false,"proxy":"","authType":"","x":270,"y":40,"wires":[["4ea876f3.e174c8"]]},{"id":"68db36ef.e3fe18","type":"debug","z":"4dc4a1b1.5df1e","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","x":650,"y":40,"wires":[]},{"id":"4ea876f3.e174c8","type":"html","z":"4dc4a1b1.5df1e","name":"","property":"","outproperty":"","tag":".buy-col-xl div.buy","ret":"attr","as":"single","x":470,"y":40,"wires":[["68db36ef.e3fe18"]]}]

ristomatti · 20 September 2020 08:53

Ah, this would only work if the page was rendered on the server which cannot be assumed of websites these days. If you modify the selector to be "body" and set the output to "html contents of the element", you'll see just a bunch of JavaScript imports is returned.

You'll need a node/library that will browse the site using a remote controlled (often headless/invisible) browser. The remote controlled browser will then parse the JavaScript and render the page as any modern browser.

If you're running Node-RED on a PC or a Mac you might be able to do it with some of the Node-RED contrib nodes I list on this topic: Reset modem through node red

On the other hand, if you're running Node-RED on a Raspberry Pi, I haven't yet come up with a node that would work out of the box, as these browser automation libraries depend on prebuilt browser binaries which tend to not be available for ARM CPU's used by Raspberry Pi's.

system · 19 November 2020 08:53

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
HTML Scrape - missing elements General	12	400	17 December 2022
Scraping a web page with data embedded in a script tag General	5	777	12 March 2021
Using node red to check if item is in stock? General	3	500	24 March 2021
Web scraping issue General	22	3045	16 January 2021
HTML Node - Problem with CSS selector General	3	2465	23 September 2020

Need some help scraping a website

Related topics