Having issues with HTTP-Request

Hi all, quite new to Node-Red - only started using it yesterday, so may be trying to run before I can walk.

Anyway, I had a notion of scraping my local refuse collection site to keep an up-to date list of when the next recycling & refuse collection dates are.

I've gotten fairly far with this, but need some extra help.

To start with, here's the relevant site. I've generated a link using a postcode different from my own, for obvious reasons Property Results | Resident Service Portal (kier.co.uk)

The fields I'm interested in are the last & next service dates for each service.

My HTTP selector is set to table.table>tbody>tr>td

Which does give me some data, but it's messy. I don't seem to be able to drill down any further into the selector.

All I really need to get back are those dates. Is there either: a way to select just those, or, transform the output to show the dates only?

Here's my current code for the flow


[{"id":"83dc1c13f18ec93e","type":"inject","z":"a2eedb73ac9b8093","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":100,"y":220,"wires":[["eafbd032b825d3ab"]]},{"id":"eafbd032b825d3ab","type":"http request","z":"a2eedb73ac9b8093","name":"","method":"GET","ret":"txt","paytoqs":"ignore","url":"https://bridgend.kier.co.uk/property/100100483617","tls":"","persist":false,"proxy":"","authType":"","senderr":false,"headers":[],"x":330,"y":320,"wires":[["a8efb783c1179c42"]]},{"id":"a8efb783c1179c42","type":"html","z":"a2eedb73ac9b8093","name":"","property":"payload","outproperty":"payload","tag":"table.table>tbody>tr>td","ret":"text","as":"single","x":510,"y":260,"wires":[["99d6a7ae09798d1d"]]},{"id":"99d6a7ae09798d1d","type":"debug","z":"a2eedb73ac9b8093","name":"Refuse Output","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":940,"y":240,"wires":[]}]

Welcome to the forum @psybernoid

I suspect the html does not include the information you want. It is probably filled in by javascript based on the post code..

The data is in the results. Here is what I would do.

So as to not spam the server with multiple results, send one request and attach a file-out node to the http-request node so you can store the file on your device.

Now you can have a flow that reads that file into mag.payload and then you can play with the results as much as you want without possible being blocked by the server for hitting it to often.

And since you will have the actual HTML of the page, you can dig thru it to see what parts you want to extract using the html node.

Try this

[{"id":"83dc1c13f18ec93e","type":"inject","z":"30af2d3e.d94ea2","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":430,"y":2700,"wires":[["eafbd032b825d3ab"]]},{"id":"eafbd032b825d3ab","type":"http request","z":"30af2d3e.d94ea2","name":"","method":"GET","ret":"txt","paytoqs":"ignore","url":"https://bridgend.kier.co.uk/property/100100483617","tls":"","persist":false,"proxy":"","authType":"","x":400,"y":2860,"wires":[["a8efb783c1179c42"]]},{"id":"a8efb783c1179c42","type":"html","z":"30af2d3e.d94ea2","name":"","property":"payload","outproperty":"payload","tag":" td.last-service, td.next-service","ret":"text","as":"single","x":600,"y":2800,"wires":[["323ff57e.22303a"]]},{"id":"323ff57e.22303a","type":"change","z":"30af2d3e.d94ea2","name":"","rules":[{"t":"set","p":"payload","pt":"msg","to":"$$.payload.$trim($split($,\"\\n\")[-1])","tot":"jsonata"}],"action":"","property":"","from":"","to":"","reg":false,"x":840,"y":2780,"wires":[["99d6a7ae09798d1d"]]},{"id":"99d6a7ae09798d1d","type":"debug","z":"30af2d3e.d94ea2","name":"Refuse Output","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","statusVal":"","statusType":"auto","x":1010,"y":2780,"wires":[]}]

Selector td.last-service, td.next-service
with the change node to clean up the return you should get

[
"25/07/2022",
"08/08/2022",
"26/07/2022",
"09/08/2022",
"02/08/2022",
"09/08/2022"
]

That's a great tip, thanks. I did not know about the file-out. I'll be making use of that.

Excellent, thanks. So I did try the td.last-service approach earlier, but I didn't realise you could combine them into a single node. And the change node you've provided is invaluable. Many thanks.

1 Like