Trying to get notifications of websites changes

rasprookie · 21 January 2020 19:55

Hi,

I'm a rookie, so be comprehensive
What I'm trying to achieve is to have a list of websites regularly watched, and get email notifications of their changes, only when it changes.

My questions are :

am I doing an appropriate use of RBE to filter only what's different from previous query ?
shouldn't there be a database, so it can compare previous query vs current one, and detect changes ?
when a new item is found, payload is the title. How can I attach the webpage link to each individual text
finally, what's the best way to check a list of websites - let's say once a week : subflows ?

I'm practicing with one site for now:

[{"id":"4e26b07.2a6cb5","type":"debug","z":"62920a25.045914","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","x":1070,"y":220,"wires":},{"id":"e8e25e38.b8785","type":"http request","z":"62920a25.045914","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"https://toronto.craigslist.org/search/ela?","tls":"","persist":false,"proxy":"","authType":"","x":470,"y":220,"wires":[["930f738.a39599"]]},{"id":"7a04832c.8f7bcc","type":"inject","z":"62920a25.045914","name":"","topic":"craigslist test","payload":"","payloadType":"date","repeat":"300","crontab":"","once":false,"onceDelay":0.1,"x":190,"y":220,"wires":[["e8e25e38.b8785"]]},{"id":"930f738.a39599","type":"html","z":"62920a25.045914","name":"Titre","property":"payload","outproperty":"payload","tag":"#sortable-results > ul > li:nth-child(n) > p","ret":"text","as":"multi","x":670,"y":220,"wires":[["b76d0f1.79c40f"]]},{"id":"b76d0f1.79c40f","type":"rbe","z":"62920a25.045914","name":"","func":"rbei","gap":"5","start":"","inout":"out","property":"payload","x":850,"y":220,"wires":[["4e26b07.2a6cb5"]]}]

I installed Web watch nodes but not sure of the mecanics under the hood, so I'm sticking to basic nodes for now

Thanks a lot !

TotallyInformation · 21 January 2020 20:08

Please repost your flow between backticks or triple backticks otherwise Discourse mangles it beyond use.

A picture of the flow is also helpful as not everyone will want to load up random flows.

Sorry, and welcome to the forum! Please don't mistake my abruptness for rudeness, you will generally find us welcoming here.

rasprookie · 21 January 2020 20:38

Thanks !

Here is the flow, and a screenshot.

[quote="rasprookie, post:1, topic:20700"]
[{"id":"4e26b07.2a6cb5","type":"debug","z":"62920a25.045914","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","x":1070,"y":220,"wires":},{"id":"e8e25e38.b8785","type":"http request","z":"62920a25.045914","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"[https://toronto.craigslist.org/search/ela?","tls":"","persist":false,"proxy":"","authType":"","x":470,"y":220,"wires":[["930f738.a39599"]]},{"id":"7a04832c.8f7bcc","type":"inject","z":"62920a25.045914","name":"","topic":"craigslist](https://toronto.craigslist.org/search/ela?%22,%22tls%22:%22%22,%22persist%22:false,%22proxy%22:%22%22,%22authType%22:%22%22,%22x%22:470,%22y%22:220,%22wires%22:%5B%5B%22930f738.a39599%22%5D%5D%7D,%7B%22id%22:%227a04832c.8f7bcc%22,%22type%22:%22inject%22,%22z%22:%2262920a25.045914%22,%22name%22:%22%22,%22topic%22:%22craigslist) test","payload":"","payloadType":"date","repeat":"300","crontab":"","once":false,"onceDelay":0.1,"x":190,"y":220,"wires":[["e8e25e38.b8785"]]},{"id":"930f738.a39599","type":"html","z":"62920a25.045914","name":"Titre","property":"payload","outproperty":"payload","tag":"#sortable-results > ul > li:nth-child(n) > p","ret":"text","as":"multi","x":670,"y":220,"wires":[["b76d0f1.79c40f"]]},{"id":"b76d0f1.79c40f","type":"rbe","z":"62920a25.045914","name":"","func":"rbei","gap":"5","start":"","inout":"out","property":"payload","x":850,"y":220,"wires":[["4e26b07.2a6cb5"]]}]
[/quote]

rasprookie · 28 January 2020 13:29

Hi !

I asked too many questions at the same time, so let me ask one first

After parsing a certain webpage, how can I 'store' the data so that next time collect data from that page, I can compare and output theirs differences only ?

Have a nice day

Colin · 28 January 2020 13:59

There are multiple ways. You could store in one of the node-red context stores, MQTT, a database, files and probably others. First you would need to decide exactly what to store (perhaps you know that already) then determine which one would be best for you. That may depend on which (if any) you already have experience of.

[Edit] The answer will also depend on whether you want to store 10 million pages, 5 or somewhere in between.

realjax · 29 January 2020 12:21

Also, you might be prodding a wasps nest here. A webpage might change in a lot of places for a lot of reasons, not only in the parts you are interested in. You might end up having to write a lot of extra code in order to filter out the false positives...

rasprookie · 29 January 2020 18:42

Thanks for your answers, I'm now exploring the possible ways to do it. It'd be closer to 5 pages than 10 million

Topic		Replies	Views
How to watch a Payload - Response when change General	3	291	16 May 2020
Filter by change or x min General	16	186	23 April 2024
Help with HTTP monitor in existing flow? General	17	119	6 February 2025
The `filter` node (or the old `RBE` node) General	17	2178	11 January 2022
How to receive just one email General	4	220	7 December 2022

Trying to get notifications of websites changes

Related topics