Need some help scraping a website

I managed to scrap a simple website as test and got the correct data back.

But I am trying scrap the “out of stock” notice on here:
https://www.nvidia.com/en-gb/shop/geforce/?page=1&limit=9&locale=en-gb

So I can make an automation for when it changes.

I can’t seem to find the correct CSS selector I’ve tried for a few hours and it always comes back “empty” in node red

I would assume when it says "out of stock" the product won't come back in stock anymore as there's also products with a "notify me" button. But the "out of stock" seems quite simple as its just text. Grab the closest surrounding HTML element which has a class or id defined and then search for the text from its innerHTML.

This is just guesswork though as you did not mention what node/tool/library are you using for the scraping.

Sorry here is the flow i am using http request, html node with .buy-col-xl div.buy as the sector and outputting to debug node, it comes back up "emptyT

[{"id":"4dc4a1b1.5df1e","type":"tab","label":"Flow 3","disabled":false,"info":""},{"id":"409600d1.4c79f","type":"inject","z":"4dc4a1b1.5df1e","name":"make request","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"x":110,"y":40,"wires":[["5feb553a.b2b6ac"]]},{"id":"5feb553a.b2b6ac","type":"http request","z":"4dc4a1b1.5df1e","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"https://www.nvidia.com/en-gb/shop/geforce/?page=1&limit=9&locale=en-gb","tls":"","persist":false,"proxy":"","authType":"","x":270,"y":40,"wires":[["4ea876f3.e174c8"]]},{"id":"68db36ef.e3fe18","type":"debug","z":"4dc4a1b1.5df1e","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","x":650,"y":40,"wires":[]},{"id":"4ea876f3.e174c8","type":"html","z":"4dc4a1b1.5df1e","name":"","property":"","outproperty":"","tag":".buy-col-xl div.buy","ret":"attr","as":"single","x":470,"y":40,"wires":[["68db36ef.e3fe18"]]}]

Ah, this would only work if the page was rendered on the server which cannot be assumed of websites these days. If you modify the selector to be "body" and set the output to "html contents of the element", you'll see just a bunch of JavaScript imports is returned.

You'll need a node/library that will browse the site using a remote controlled (often headless/invisible) browser. The remote controlled browser will then parse the JavaScript and render the page as any modern browser.

If you're running Node-RED on a PC or a Mac you might be able to do it with some of the Node-RED contrib nodes I list on this topic: Reset modem through node red

On the other hand, if you're running Node-RED on a Raspberry Pi, I haven't yet come up with a node that would work out of the box, as these browser automation libraries depend on prebuilt browser binaries which tend to not be available for ARM CPU's used by Raspberry Pi's.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.