Trying to get notifications of websites changes

Hi,

I'm a rookie, so be comprehensive :slight_smile:
What I'm trying to achieve is to have a list of websites regularly watched, and get email notifications of their changes, only when it changes.

My questions are :

  • am I doing an appropriate use of RBE to filter only what's different from previous query ?
  • shouldn't there be a database, so it can compare previous query vs current one, and detect changes ?
  • when a new item is found, payload is the title. How can I attach the webpage link to each individual text
  • finally, what's the best way to check a list of websites - let's say once a week : subflows ?

I'm practicing with one site for now:

[{"id":"4e26b07.2a6cb5","type":"debug","z":"62920a25.045914","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","x":1070,"y":220,"wires":},{"id":"e8e25e38.b8785","type":"http request","z":"62920a25.045914","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"https://toronto.craigslist.org/search/ela?","tls":"","persist":false,"proxy":"","authType":"","x":470,"y":220,"wires":[["930f738.a39599"]]},{"id":"7a04832c.8f7bcc","type":"inject","z":"62920a25.045914","name":"","topic":"craigslist test","payload":"","payloadType":"date","repeat":"300","crontab":"","once":false,"onceDelay":0.1,"x":190,"y":220,"wires":[["e8e25e38.b8785"]]},{"id":"930f738.a39599","type":"html","z":"62920a25.045914","name":"Titre","property":"payload","outproperty":"payload","tag":"#sortable-results > ul > li:nth-child(n) > p","ret":"text","as":"multi","x":670,"y":220,"wires":[["b76d0f1.79c40f"]]},{"id":"b76d0f1.79c40f","type":"rbe","z":"62920a25.045914","name":"","func":"rbei","gap":"5","start":"","inout":"out","property":"payload","x":850,"y":220,"wires":[["4e26b07.2a6cb5"]]}]

I installed Web watch nodes but not sure of the mecanics under the hood, so I'm sticking to basic nodes for now

Thanks a lot !

Please repost your flow between backticks or triple backticks otherwise Discourse mangles it beyond use.

A picture of the flow is also helpful as not everyone will want to load up random flows.

Sorry, and welcome to the forum! :grinning: Please don't mistake my abruptness for rudeness, you will generally find us welcoming here.

Thanks !

Here is the flow, and a screenshot.

[quote="rasprookie, post:1, topic:20700"]
[{"id":"4e26b07.2a6cb5","type":"debug","z":"62920a25.045914","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"payload","targetType":"msg","x":1070,"y":220,"wires":},{"id":"e8e25e38.b8785","type":"http request","z":"62920a25.045914","name":"","method":"GET","ret":"txt","paytoqs":false,"url":"[https://toronto.craigslist.org/search/ela?","tls":"","persist":false,"proxy":"","authType":"","x":470,"y":220,"wires":[["930f738.a39599"]]},{"id":"7a04832c.8f7bcc","type":"inject","z":"62920a25.045914","name":"","topic":"craigslist](https://toronto.craigslist.org/search/ela?%22,%22tls%22:%22%22,%22persist%22:false,%22proxy%22:%22%22,%22authType%22:%22%22,%22x%22:470,%22y%22:220,%22wires%22:%5B%5B%22930f738.a39599%22%5D%5D%7D,%7B%22id%22:%227a04832c.8f7bcc%22,%22type%22:%22inject%22,%22z%22:%2262920a25.045914%22,%22name%22:%22%22,%22topic%22:%22craigslist) test","payload":"","payloadType":"date","repeat":"300","crontab":"","once":false,"onceDelay":0.1,"x":190,"y":220,"wires":[["e8e25e38.b8785"]]},{"id":"930f738.a39599","type":"html","z":"62920a25.045914","name":"Titre","property":"payload","outproperty":"payload","tag":"#sortable-results > ul > li:nth-child(n) > p","ret":"text","as":"multi","x":670,"y":220,"wires":[["b76d0f1.79c40f"]]},{"id":"b76d0f1.79c40f","type":"rbe","z":"62920a25.045914","name":"","func":"rbei","gap":"5","start":"","inout":"out","property":"payload","x":850,"y":220,"wires":[["4e26b07.2a6cb5"]]}]
[/quote]

Hi !

I asked too many questions at the same time, so let me ask one first :slight_smile:

After parsing a certain webpage, how can I 'store' the data so that next time collect data from that page, I can compare and output theirs differences only ?

Have a nice day

There are multiple ways. You could store in one of the node-red context stores, MQTT, a database, files and probably others. First you would need to decide exactly what to store (perhaps you know that already) then determine which one would be best for you. That may depend on which (if any) you already have experience of.

[Edit] The answer will also depend on whether you want to store 10 million pages, 5 or somewhere in between.

Also, you might be prodding a wasps nest here. A webpage might change in a lot of places for a lot of reasons, not only in the parts you are interested in. You might end up having to write a lot of extra code in order to filter out the false positives...

Thanks for your answers, I'm now exploring the possible ways to do it. It'd be closer to 5 pages than 10 million :wink: