Scraping a web page with data embedded in a script tag

Hello,

I am testing scraping a webpage to show stock levels of items, rather than constantly checking a website. Example page to scrape: Link

I have tried using the examples and looked at other similar questions but this appears a little different. I have tried using using the selector #quantityAvailable, but I am returned the wrong total.

When initially loading the page in a browser you do briefly see the total that Node-Red is returning before it then changes to display the specific item relating to the size I have selected.

Once inspecting the page, it appears all of the totals for all sizes are stored in a script file which gets embedded in the page inside a script tag and I'm guessing is some form of XML.

How can I access this embedded data with Node-Red and display the totals, I can't figure it out.

Any help appreciated.

Well, interestingly, when reloading that page, I see that it starts with a stock value of 24240 and then immediately switches to 522.

Since the http-request just does a single get and doesn't process any scripts, you are rather stuffed. Probably the vendor stopping this kind of scaping.

The only way around this is to automate a proper browser. There are nodes that will do this have a search through the flows site.

Beginning to think this, will look at what Nodes I can find to help.

you are rather stuffed.

Ok the only way i see is to parse out the javascript and target the var =xxxx

then use eval() which could be missused and execute code so do your checks.

once you have the code and you use eval you can then move the vars to msg
eg.

[{"id":"7a616f96.00ad9","type":"function","z":"8d22ae29.7df6d","name":"","func":"msg.payload = msg.payload.split(\"var attributesCombinations=\")[1].split(\"/\\* ]]> \\*/</script>\")[0]\neval(\"var attributesCombinations=\"+msg.payload);\nmsg.quantity = quantityAvailable;\nmsg.attributeCombibnation = attributesCombinations;\nmsg.combinations =combinations;\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","x":410,"y":3860,"wires":[["a2bea5e6.17f7c8"]]},{"id":"861e6d4e.bd532","type":"http request","z":"8d22ae29.7df6d","name":"","method":"GET","ret":"txt","paytoqs":"ignore","url":"https://www.kidswholesaleclothing.co.uk/plain-raglan-pyjamas/1065-light-blue-long-raglan-sleeve-pyjama-set.html#/72-size-3_4yrs/143-quantity-pack_of_6","tls":"","persist":false,"proxy":"","authType":"","x":230,"y":3840,"wires":[["7a616f96.00ad9"]]},{"id":"a2bea5e6.17f7c8","type":"debug","z":"8d22ae29.7df6d","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","statusVal":"","statusType":"auto","x":570,"y":3860,"wires":[]},{"id":"69646fb2.bba928","type":"inject","z":"8d22ae29.7df6d","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":110,"y":3760,"wires":[["861e6d4e.bd532"]]}]

I have target all the vars, but you could just target var = quantityAvailable

[edit] just targeting quantity

[{"id":"861e6d4e.bd532","type":"http request","z":"8d22ae29.7df6d","name":"","method":"GET","ret":"txt","paytoqs":"ignore","url":"https://www.kidswholesaleclothing.co.uk/plain-raglan-pyjamas/1065-light-blue-long-raglan-sleeve-pyjama-set.html#/72-size-3_4yrs/143-quantity-pack_of_6","tls":"","persist":false,"proxy":"","authType":"","x":230,"y":3840,"wires":[["7a616f96.00ad9"]]},{"id":"69646fb2.bba928","type":"inject","z":"8d22ae29.7df6d","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":110,"y":3760,"wires":[["861e6d4e.bd532"]]},{"id":"7a616f96.00ad9","type":"function","z":"8d22ae29.7df6d","name":"","func":"msg.payload = msg.payload.split(\"var quantityAvailable=\")[1].split(\"var quickView=\")[0]\neval(\"var quantityAvailable=\"+msg.payload);\nmsg.payload = quantityAvailable;\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","x":410,"y":3860,"wires":[["a2bea5e6.17f7c8"]]},{"id":"a2bea5e6.17f7c8","type":"debug","z":"8d22ae29.7df6d","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","statusVal":"","statusType":"auto","x":570,"y":3860,"wires":[]}]

Hi, I took a similar approach with @E1cid ... with a few extra splits you can scrap the json from it and avoid eval and use JSON.parse.

[{"id":"604e70a5.6396a8","type":"inject","z":"ac0f61dd.69e26","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":140,"y":960,"wires":[["880f0446.3e78c8"]]},{"id":"880f0446.3e78c8","type":"http request","z":"ac0f61dd.69e26","name":"","method":"GET","ret":"txt","paytoqs":"ignore","url":"https://www.kidswholesaleclothing.co.uk/plain-raglan-pyjamas/1065-light-blue-long-raglan-sleeve-pyjama-set.html#/72-size-3_4yrs/143-quantity-pack_of_6","tls":"","persist":false,"proxy":"","authType":"","x":310,"y":960,"wires":[["4d43bf6b.b2d568"]]},{"id":"4d43bf6b.b2d568","type":"html","z":"ac0f61dd.69e26","name":"","property":"payload","outproperty":"payload","tag":"script","ret":"text","as":"single","x":470,"y":960,"wires":[["98d4bdf4.920678"]]},{"id":"fc1481da.94d38","type":"debug","z":"ac0f61dd.69e26","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":790,"y":920,"wires":[]},{"id":"98d4bdf4.920678","type":"function","z":"ac0f61dd.69e26","name":"","func":"let quantity = msg.payload[0].split(\"var combinations=\")[1].split(\";var combinationsFromController\")[0]\n//.split(\";var availableLaterValue=\")[0]\nmsg.payload = JSON.parse(quantity)\n//node.warn(quantity)\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","x":620,"y":960,"wires":[["fc1481da.94d38","fe3813b3.7eea3"]]},{"id":"fe3813b3.7eea3","type":"debug","z":"ac0f61dd.69e26","name":"","active":true,"tosidebar":true,"console":false,"tostatus":true,"complete":"payload[\"39772\"].quantity","targetType":"msg","statusVal":"payload[\"39772\"].quantity","statusType":"auto","x":850,"y":1000,"wires":[]}]

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.