How to scrap with node-red when http request fails?

Hy guys ... I would like to know if there is any method to get the price from this product...

The link for the product.. Product
Here is my code :

[
    {
        "id": "90fe156d.97b8e8",
        "type": "inject",
        "z": "cd02b28.20baf5",
        "name": "",
        "topic": "",
        "payload": "",
        "payloadType": "date",
        "repeat": "",
        "crontab": "",
        "once": false,
        "onceDelay": 0.1,
        "x": 140,
        "y": 740,
        "wires": [
            [
                "ecae983f.47f5a8"
            ]
        ]
    },
    {
        "id": "5c05c01d.dc5cc8",
        "type": "html",
        "z": "cd02b28.20baf5",
        "name": "",
        "property": "",
        "outproperty": "",
        "tag": ".tabs-content",
        "ret": "html",
        "as": "single",
        "x": 460,
        "y": 740,
        "wires": [
            [
                "249aa5e1.5b7aa2"
            ]
        ]
    },
    {
        "id": "249aa5e1.5b7aa2",
        "type": "change",
        "z": "cd02b28.20baf5",
        "name": "",
        "rules": [
            {
                "t": "set",
                "p": "payload",
                "pt": "msg",
                "to": "msg.payload[0]",
                "tot": "msg"
            }
        ],
        "action": "",
        "property": "",
        "from": "",
        "to": "",
        "reg": false,
        "x": 680,
        "y": 740,
        "wires": [
            [
                "bde8ccfb.b9e438"
            ]
        ]
    },
    {
        "id": "bde8ccfb.b9e438",
        "type": "switch",
        "z": "cd02b28.20baf5",
        "name": "",
        "property": "payload",
        "propertyType": "msg",
        "rules": [
            {
                "t": "cont",
                "v": "343",
                "vt": "str"
            },
            {
                "t": "else"
            }
        ],
        "checkall": "true",
        "repair": false,
        "outputs": 2,
        "x": 850,
        "y": 740,
        "wires": [
            [],
            [
                "ff823761.4e5a4"
            ]
        ]
    },
    {
        "id": "ff823761.4e5a4",
        "type": "change",
        "z": "cd02b28.20baf5",
        "name": "",
        "rules": [
            {
                "t": "set",
                "p": "payload",
                "pt": "msg",
                "to": "YESS!!!!",
                "tot": "str"
            }
        ],
        "action": "",
        "property": "",
        "from": "",
        "to": "",
        "reg": false,
        "x": 1020,
        "y": 740,
        "wires": [
            [
                "55c3909.a1bf6f"
            ]
        ]
    },
    {
        "id": "ecae983f.47f5a8",
        "type": "https-node",
        "z": "cd02b28.20baf5",
        "name": "",
        "method": "GET",
        "ret": "txt",
        "url": "https://www.worten.pt/produtos/ar-condicionado-becken-bac2323-24-m2-12000-btu-branco-5723895",
        "authorized": false,
        "agent": true,
        "x": 290,
        "y": 740,
        "wires": [
            [
                "5c05c01d.dc5cc8"
            ]
        ]
    },
    {
        "id": "55c3909.a1bf6f",
        "type": "debug",
        "z": "cd02b28.20baf5",
        "name": "",
        "active": true,
        "tosidebar": true,
        "console": false,
        "tostatus": false,
        "complete": "false",
        "x": 1210,
        "y": 740,
        "wires": []
    }
]

can any one show me where i'm wrong ?

Sadly, that isn't valid to import I'm afraid, you will need to try again.

Please read this - How to share code or flow json

Sorry Guys :wink: Fixed @Paul-Reed and @TotallyInformation ...

1 Like

Thanks, but could you simplify it as well? There are 3 nodes that I don't have installed and it is a pain to install/uninstall nodes all the time when helping people.

As you only are asking about how to get a specific value, an inject node, an http-request node, the html node and the debug node are all you need.

1 Like

Here you have it @TotallyInformation

[
   {
       "id": "90fe156d.97b8e8",
       "type": "inject",
       "z": "cd02b28.20baf5",
       "name": "",
       "topic": "",
       "payload": "",
       "payloadType": "date",
       "repeat": "",
       "crontab": "",
       "once": false,
       "onceDelay": 0.1,
       "x": 140,
       "y": 740,
       "wires": [
           [
               "ecae983f.47f5a8"
           ]
       ]
   },
   {
       "id": "5c05c01d.dc5cc8",
       "type": "html",
       "z": "cd02b28.20baf5",
       "name": "",
       "property": "",
       "outproperty": "",
       "tag": ".tabs-content",
       "ret": "html",
       "as": "single",
       "x": 460,
       "y": 740,
       "wires": [
           [
               "249aa5e1.5b7aa2"
           ]
       ]
   },
   {
       "id": "249aa5e1.5b7aa2",
       "type": "change",
       "z": "cd02b28.20baf5",
       "name": "",
       "rules": [
           {
               "t": "set",
               "p": "payload",
               "pt": "msg",
               "to": "msg.payload[0]",
               "tot": "msg"
           }
       ],
       "action": "",
       "property": "",
       "from": "",
       "to": "",
       "reg": false,
       "x": 680,
       "y": 740,
       "wires": [
           [
               "bde8ccfb.b9e438"
           ]
       ]
   },
   {
       "id": "bde8ccfb.b9e438",
       "type": "switch",
       "z": "cd02b28.20baf5",
       "name": "",
       "property": "payload",
       "propertyType": "msg",
       "rules": [
           {
               "t": "cont",
               "v": "343",
               "vt": "str"
           },
           {
               "t": "else"
           }
       ],
       "checkall": "true",
       "repair": false,
       "outputs": 2,
       "x": 850,
       "y": 740,
       "wires": [
           [],
           [
               "ff823761.4e5a4"
           ]
       ]
   },
   {
       "id": "ff823761.4e5a4",
       "type": "change",
       "z": "cd02b28.20baf5",
       "name": "",
       "rules": [
           {
               "t": "set",
               "p": "payload",
               "pt": "msg",
               "to": "YESS!!!!",
               "tot": "str"
           }
       ],
       "action": "",
       "property": "",
       "from": "",
       "to": "",
       "reg": false,
       "x": 1020,
       "y": 740,
       "wires": [
           [
               "55c3909.a1bf6f"
           ]
       ]
   },
   {
       "id": "ecae983f.47f5a8",
       "type": "https-node",
       "z": "cd02b28.20baf5",
       "name": "",
       "method": "GET",
       "ret": "txt",
       "url": "https://www.worten.pt/produtos/ar-condicionado-becken-bac2323-24-m2-12000-btu-branco-5723895",
       "authorized": false,
       "agent": true,
       "x": 290,
       "y": 740,
       "wires": [
           [
               "5c05c01d.dc5cc8"
           ]
       ]
   },
   {
       "id": "55c3909.a1bf6f",
       "type": "debug",
       "z": "cd02b28.20baf5",
       "name": "",
       "active": true,
       "tosidebar": true,
       "console": false,
       "tostatus": false,
       "complete": "false",
       "x": 1210,
       "y": 740,
       "wires": []
   }
]

Thanks. I got the link manually anyway.

That is one horrid web page!

By the way, to do this yourself, make use of your browser's developer console. Switch to the elements tab and use the element selector. You can then look at the tags and copy out the CSS Selector for the appropriate tag.

Here is the relevant HTML:

image

As you can see, if you can tease the content attribute from #panel1c > div > div.w-product__actions.iss-product-stock-button > div.w-product__price.iss-product-price-container > span.w-product__price__current.iss-product-current-price then you are good to go :wink:

So, something like this should work:

[{"id":"8cab92ce.9323d","type":"debug","z":"4e0def05.d4415","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":850,"y":260,"wires":[]},{"id":"5479aade.0fd914","type":"inject","z":"4e0def05.d4415","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":220,"y":260,"wires":[["c2e2b434.9eddf8"]]},{"id":"c2e2b434.9eddf8","type":"http request","z":"4e0def05.d4415","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"https://www.worten.pt/produtos/ar-condicionado-becken-bac2323-24-m2-12000-btu-branco-5723895","tls":"","proxy":"","authType":"","x":370,"y":260,"wires":[["34cfeab1.5effd6"]]},{"id":"34cfeab1.5effd6","type":"html","z":"4e0def05.d4415","name":"","property":"payload","outproperty":"payload","tag":".iss-product-current-price","ret":"html","as":"single","x":590,"y":260,"wires":[["8cab92ce.9323d"]]}]

But it doesn't because the page includes some nasty javascript checks and the http-request node fails them and triggers a Google challenge/response as you will see if you replace the selector in the html node with body div.

1 Like

Weird but if you wget on console it works... it's possible to wget using console and read as a file ?

I think that the problem is the user agent....

You should be able to set the user agent. I think it is possibly looking for a cookie as well so you might try capturing that from a browser session and see if you can add it back in, the http-request node lets you do that.

Yes, you should also be able to use wget, you can use it with the exec node and feed its output back to your flow. You should then be able to process it using the http node.