Website Parsing/ Scraping with HTTP request

Hi All , hope everyone is safe

first post here , hopefully I can follow the forum procedures . I started using Node Red during lockdown and absolutely love it .

I am trying to webscrape/Parse a website . I am battling however to retrieve html attributes. from the Div class related-Item I need to try get the attribute data-adid. I have no problem retrieving other information , just attributes .

data-adid="1007196353500912404170409"

<div class="related-item has-hover has-actions-bar" **data-adid="1007196353500912404170409"** data-short-id="719635350" data-is-partner="false" data-ad-type="SRPGallery" data-seller-name="Sasha">
[{"id":"8c313e8.b5fcfc","type":"http request","z":"7c14918b.c96f7","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"https://www.gumtree.co.za/s-western-cape/v1l3100001p1?q=engel+fridge","tls":"","persist":false,"proxy":"","authType":"","x":230,"y":260,"wires":[["341b8675.6fe6ea"]]},{"id":"eb50ddda.b8698","type":"inject","z":"7c14918b.c96f7","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":120,"y":180,"wires":[["8c313e8.b5fcfc"]]},{"id":"341b8675.6fe6ea","type":"html","z":"7c14918b.c96f7","name":"","property":"payload","outproperty":"payload","tag":"div.related-item","ret":"html","as":"multi","x":140,"y":360,"wires":[["a784b43c.cf06b8","d367928.3c7cb7","f431e9a8.e68d38","9adc477e.e224f8"]]},{"id":"7e7005af.e534dc","type":"debug","z":"7c14918b.c96f7","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","x":750,"y":420,"wires":[]},{"id":"a784b43c.cf06b8","type":"html","z":"7c14918b.c96f7","name":"title","property":"payload","outproperty":"payload","tag":"a","ret":"text","as":"multi","x":350,"y":400,"wires":[["d6142b69.4f8a28"]]},{"id":"d367928.3c7cb7","type":"html","z":"7c14918b.c96f7","name":"price","property":"payload","outproperty":"payload","tag":"div.price","ret":"text","as":"multi","x":350,"y":360,"wires":[["d6142b69.4f8a28"]]},{"id":"f431e9a8.e68d38","type":"html","z":"7c14918b.c96f7","name":"description","property":"payload","outproperty":"payload","tag":"span.description-text","ret":"text","as":"multi","x":370,"y":440,"wires":[["d6142b69.4f8a28"]]},{"id":"d6142b69.4f8a28","type":"join","z":"7c14918b.c96f7","name":"","mode":"custom","build":"array","property":"payload","propertyType":"msg","key":"payload","joiner":"\\n","joinerType":"str","accumulate":false,"timeout":"","count":"4","reduceRight":false,"reduceExp":"","reduceInit":"","reduceInitType":"num","reduceFixup":"","x":570,"y":420,"wires":[["7e7005af.e534dc"]]},{"id":"9adc477e.e224f8","type":"html","z":"7c14918b.c96f7","name":"place","property":"payload","outproperty":"payload","tag":"div.location-date","ret":"text","as":"multi","x":350,"y":480,"wires":[["d6142b69.4f8a28"]]}]

image

I am also using a join to create an array , I see the array created is not always in order due to it arriving at different times as the join . I tried using a key/object in join but didn't seem to get the desired output .

Many Thanks
Mark

1 Like

Hi Mark, it looks like the flow you provided is just a debug node (you should select the items to export before activating the export dialog)

As for the order of your array, you could sort it.

Alternatively, doesn't the join node allow you to make a key : value object instead of array? If so, then you could simply access items by name.

Thanks Steve-Mcl , I corrected my mistake

I need to play with the key:value object a bit more . I did try to use it . I think I need to learn the basics first and then build up on it . But the attributes I just cannot get working , no matter what I do .

Maybe this will steer you...

[{"id":"2da7a834.947e38","type":"http request","z":"c70ba4a4.e7fb58","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"https://www.gumtree.co.za/s-western-cape/v1l3100001p1?q=engel+fridge","tls":"","persist":false,"proxy":"","authType":"","x":250,"y":200,"wires":[["957f06be.3ed5e8"]]},{"id":"e632e103.e5f5f","type":"inject","z":"c70ba4a4.e7fb58","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":140,"y":120,"wires":[["2da7a834.947e38"]]},{"id":"957f06be.3ed5e8","type":"html","z":"c70ba4a4.e7fb58","name":"","property":"payload","outproperty":"payload","tag":"div.related-item.has-hover.has-actions-bar","ret":"attr","as":"multi","x":520,"y":200,"wires":[["21532661.65698a"]]},{"id":"21532661.65698a","type":"debug","z":"c70ba4a4.e7fb58","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"true","targetType":"full","x":610,"y":260,"wires":[]}]
1 Like

Oh wow .... thats brilliant .

sometimes I over complicate things when the answer is so simple ( but only if you know what you doing)

Thanks Steve , marked your answer as the solution :slight_smile:

I'll go through this and compare to some of my settings to see where I got lost .

Don't forget when using that mode to make sure each stream has a different topic (or whatever attribute you specify as the key).

Many Thanks Colin .

I'm busy currently watching a few tutorials online. I can see that I am missing a lot of the basics of node red . One of the problems was the way I was using the debug node . I should have used complete object see see more information on inside the object .

I think I need to understand a bit more basics before trying to make my flows usable .

Thanks , I will check out the stream topics for the node.

There is an example on how to join streams in the node red cookbook, with an example you can Import to test.

Thanks Colin .

I think that's exactly what I'm looking for now after Steve helped with the initial part of my problem . I seem to be able to get the required information , but grouping it together has been challenging .

Thanks Guys.

I dont think this is the most efficient or even correct way , but between both your answers I got what I needed . I would have preferred to get the object data-adid in the main object but it works 100%

both your answers were part of the solution but I cant make both the solution ?

[{"id":"8a669594.4c4448","type":"http request","z":"27c64a0d.0d30e6","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"https://www.gumtree.co.za/s-western-cape/v1l3100001p1?q=engel+fridge","tls":"","persist":false,"proxy":"","authType":"","x":190,"y":260,"wires":[["ce771b7a.b97008"]]},{"id":"9183ee78.b54a3","type":"inject","z":"27c64a0d.0d30e6","name":"","topic":"","payload":"","payloadType":"date","repeat":"","crontab":"","once":false,"onceDelay":0.1,"x":140,"y":140,"wires":[["8a669594.4c4448"]]},{"id":"ac125931.897ca8","type":"html","z":"27c64a0d.0d30e6","name":"title","property":"payload","outproperty":"payload","tag":"a","ret":"text","as":"multi","x":410,"y":360,"wires":[["47a46d8c.365de4"]]},{"id":"2427010b.3792be","type":"html","z":"27c64a0d.0d30e6","name":"description","property":"payload","outproperty":"payload","tag":"span.description-text","ret":"text","as":"multi","x":430,"y":420,"wires":[["761057fd.b117c8"]]},{"id":"22ce2d56.d394c2","type":"html","z":"27c64a0d.0d30e6","name":"place","property":"payload","outproperty":"payload","tag":"div.location-date","ret":"text","as":"multi","x":410,"y":480,"wires":[["5f915d33.865ce4"]]},{"id":"ce771b7a.b97008","type":"html","z":"27c64a0d.0d30e6","name":"","property":"payload","outproperty":"payload","tag":"div.related-item","ret":"html","as":"multi","x":200,"y":400,"wires":[["d3e665f0.d9c9e8","ac125931.897ca8","2427010b.3792be","22ce2d56.d394c2","b488ab8d.d119d8"]]},{"id":"d3e665f0.d9c9e8","type":"html","z":"27c64a0d.0d30e6","name":"price","property":"payload","outproperty":"payload","tag":"div.price","ret":"text","as":"multi","x":410,"y":300,"wires":[["ed64291d.13ac58"]]},{"id":"c3e10475.f07088","type":"debug","z":"27c64a0d.0d30e6","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","x":1090,"y":420,"wires":[]},{"id":"d145ec4.288b81","type":"join","z":"27c64a0d.0d30e6","name":"","mode":"custom","build":"object","property":"payload","propertyType":"msg","key":"topic","joiner":"\\n","joinerType":"str","accumulate":false,"timeout":"","count":"5","reduceRight":false,"reduceExp":"","reduceInit":"","reduceInitType":"","reduceFixup":"","x":930,"y":420,"wires":[["c3e10475.f07088"]]},{"id":"ed64291d.13ac58","type":"change","z":"27c64a0d.0d30e6","name":"Set Topic - Price","rules":[{"t":"set","p":"topic","pt":"msg","to":"price","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":600,"y":300,"wires":[["d145ec4.288b81"]]},{"id":"47a46d8c.365de4","type":"change","z":"27c64a0d.0d30e6","name":"Set Topic - Title","rules":[{"t":"set","p":"topic","pt":"msg","to":"title","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":600,"y":360,"wires":[["d145ec4.288b81"]]},{"id":"761057fd.b117c8","type":"change","z":"27c64a0d.0d30e6","name":"Set Topic -  Desc","rules":[{"t":"set","p":"topic","pt":"msg","to":"description","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":600,"y":420,"wires":[["d145ec4.288b81"]]},{"id":"5f915d33.865ce4","type":"change","z":"27c64a0d.0d30e6","name":"Set Topic - Place","rules":[{"t":"set","p":"topic","pt":"msg","to":"place","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":610,"y":480,"wires":[["d145ec4.288b81"]]},{"id":"b488ab8d.d119d8","type":"html","z":"27c64a0d.0d30e6","name":"attribs","property":"payload","outproperty":"payload","tag":"div.watchListV2","ret":"attr","as":"multi","x":410,"y":540,"wires":[["22ba096b.729f76"]]},{"id":"22ba096b.729f76","type":"change","z":"27c64a0d.0d30e6","name":"Set Topic - Attrib data-adid","rules":[{"t":"set","p":"topic","pt":"msg","to":"attribs","tot":"str"}],"action":"","property":"","from":"","to":"","reg":false,"x":640,"y":540,"wires":[["d145ec4.288b81"]]}]

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.