Scraping the web with node-red?

Is it possible to do advanced scraping on webpages ?

Requirements:
click on -> pagination" pages and also press enter for google search.
Find field -> add text to input field (google search)
Enter -> add keywords to google and press enter.

I have tried Selenium Webdriver, but on ubuntu server it doesn't work since it cant open chrome ui.
Is there any other ways to achieve this ?

A quick search.

Disclaimer: I know nothing about this module - but it seems to be right up your street

1 Like

Hi,

I'm not sure what you mean regarding 'cant open chrome ui' on Ubuntu - could you give us a little more information / details please? Errors, your code, tools etc.

This is definitely possible and I think you were almost there.

I use: topological-nodered-wdio (node) - Node-RED

It allows you to execute ALL WDIO / Selenium commands, so clicking on page objects, adding data to fields, and clicking the 'Enter' button are all things I've done before.
Admittedly I'm using Windows, but I'd still like to know a bit more about the specifics of your situation.

Cheers,

Paul

1 Like

great, thank you! I will try more and give solution then here :slight_smile:
was trying out and thought its not possible but good to know it can be achieved :smiley:
only get one error from "node-red-contrib-selenium-wd2" -> error I get is msg : string[32] "Can't open an instance of chrome", but I'll try these tips given here.

edit: and how I get that error, its just injected "Open webpage module" and tried open google.com

Hi,

How are you running Selenium?
I'd assume as npm packages in a shell?
Something like: 'selenium-standalone start'

You need one shell running node-red and a separate shell running selenium.

Share your code...

Cheers,

Paul

1 Like

Thank you very much! :smiley:
you got me to right tracks, got tips from "selenium-standalone start"
found docker image for selenium/standalone-chrome

launched the docker image and webdriver started working at localhost:4444

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.