Remove "maximum number of catches"

Good evening everyone.

I'm working on a flow to create a bot to retrieve a lot of datas from a website: it loops a http request via the nbrowser node, and each time it retrieves the data by doing a few steps. For reason I don't understand, sometimes the call freezes at some point: sometimes the click doesn't bring me to the right page but just refreshes the current one, and sometimes the bot is unable to retrieve the right selector when in the correct page. I have no idea why.
That wouldn't be a big deal, because I built it trying to go around this problem, inserting a catch: so, whenever an error occurs, whatever error it might be, the node simply restarts.

And that's good.
The problem is that I have to do this operation thousands of times (about 3000/4000 addresses), so after too few calls, the flow stops working, giving me this error:

"Message exceeded maximum number of catches"

So we get to the question: can I remove the maximum number of catches allowed by the catch node?
Thanks for any help or suggestion!

Though maybe off topic a bit, but to me it is begging to be asked:

Why that many and why so quickly?
3,000 to 4,000 is a lot of calls.
I only put the , in there to help quantify the size.

I thought (at first glance) it was 30,000 which is a whole other ball game.

I have a list of VAT numbers (i think that's the correct translation? English is not my first language), and I need to get the related addresses of the corresponding companies.
What do you mean with "why so quickly?"

Well, if you were doing the 3000 lookups over a longer time it may not be a problem.

(I am not really understanding the entire problem. I'm only asking because of what I see)

My thoughts are like this:
You have to do what ever it is you do to all these addresses.
It makes sense to automate it.

But if you get the computer to do them ALL at once - with that many to do - it could run out of memory.

Can you brake it down into two smaller lists and do it in two passes?
(or maybe 3)

That way you may not run out of space and blow the catch number.

It does not make all the calls at once: I put all the VAT numbers in an array, and then cycle it, making one call at a time for 3000 times. I thought about simply re-injecting the flow when it crashes, but it does it after 20-30 calls so it's not good enough: I need the bot to be usable once a month or so, with similar numbers. So this is why I need a way to go around the maximum number of catches.

No - you cannot alter this, it is hard coded at 10, and is to stop Node RED from igniting and causing further instability, You have an error loop that needs to be fixed.

Personally, and I mean this in the nicest way, you should really fix the fault at source, and not try to avoid Node RED error handling.

1 Like

Oh, ok.

I see there is another reply but just quickly....

So with what you say, there may be a problem with how the program releases memory after each call.

I know you are using Node-red, but.......

Each time you call the function to process the data from each customer, memory is used.
Then when complete the memory should be released.

If it isn't being released then eventually you will run out of memory.

That's the best way I can explain what I suspect COULD be the problem.

You may want to use a queue to only fire off the next request once one has completed successfully to prevent a DDOS on the serving end... maybe something like this Node red flow queuing - #3 by Colin - that flow is self throttling and goes as fast as it can - but one at a time.

2 Likes

And just to plug one of my nodes to facilitate this type of control. :angel:

it provides the same thing that @Colin wrote - just in a node collection with extras, such as a built in fail-safe

1 Like

The flow has a fail safe also. But yes.

1 Like

I am trying to use this solution, but I don't really get how it works (I'm kinda new to node-red, so sorry for my ignorance ^^).

Could you give me a quick tutorial? I don't get how this could fix my problem: if the flow gets an error, wouldn't it stop anyway?

I tried fixing the error of course :sob:
I really don't understand why those errors occur. It looks like it depends from the website, because they look completely random.

What have you found out about the error source, as suggested it could be the website blocking you in some way due to unusual traffic ?

What service are you using to obtain the address?
You realise there is a Web Service for this type of resource retrieval?

Im bringing it up, as you seem to be using web scraping.

https://developer.service.hmrc.gov.uk/api-documentation/docs/api/service/vat-registered-companies-api/1.0

The return provides the required info.

{
  "target": {
    "name": "Credite Sberger Donal Inc.",
    "vatNumber": "553557881",
    "address": {
      "line1": "131B Barton Hamlet",
      "postcode": "SW97 5CK",
      "countryCode": "GB"
    }
  },
  "processingDate": "2019-01-31T12:43:17+00:00"
}

I realise the is a UK web service, but its worth checking to see if something similar exists with your government, if not residing in the UK

Further more, I would look at building a database of VAT-> Address's, so overtime you can cut down on the number of requests as your DB gets populated

I found lots of API sources doing what you say, but they are not free. I was looking for a way to do it with node red, but it seems like it is not the best tool to do such a big web scraping...

Which node is generating the errors you are catching?

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.