Frozen HTTP nodes and how to debug it?

Hello, im running into issues where the nodered http in node freezes.

I have variables saved in nodered, which are then requested through a http in set to get.
This will then return the requested data.
Easy, smooth, clean, simple.

Until it just stops.
There's no errors, the portainer log only shows that im redploying trying to fix the issue, and there is nothing coming out from the node when attatching it to a debug node.
The script that performed the request shows a 404 in the console.

Now sometimes you can get away with moving the node and redeploying, or sometimes you have to move and change the name.
But more recently, you have to recreate it.
Create a new http in, set it to get, copy-paste the web adress, move the link from the old to the new node.
And then it just works, even though its exactly the same thing.

So im asking if anyone has a clue what this could be, and any guidence on how to debug this.
Seeing how the issue is with the first node in the chain, im having trouble even knowing where to start debugging.
Any help is apreciated.

Can you share a minimal flow that demonstrates the issue?

bilde

This is all you need.
And they will both work initially.
Then after a while /data2 will stop working, and debug 2 will stop seing anything coming out from /data2, it will just freeze up.

Then i can delete the old /data2, put in a new http in node, set it to get and /data2, and it will instantly work again.

But i don't know how to properly diagnose this, either my google fu is failing me, or debug information is lacking

Few more questions to help me get to the bottom of this:

  1. how many calls (typically) before this stops working?

  2. how frequent (at what rate) are they being called?

  3. Is it only ever data2 that stops working?

  4. What are you using to call the endpoints?

    • also, are you calling by IP, hostname?
    • is the node-red a local network or remote/cloud device?
  5. What is your env?

    • OS
    • Node JS version
    • Node-RED version
    • Running in docker?

Are the debugs producing 100's of messages ? Because I think the debug sidebar will only show max x number of messages and it will stop.

I currently have no idea how many calls, but its seemingly random.
The script that last failed the get request is running once a minute, which is what they're usually set to.
Its any of the http in nodes.

Many different ways, but a Lua script and a browser is what specifically was used last.
The lua script would return sucess, but with nil data returned.
The browser would put a 404 in the logs.

Web adresses, some nodes/adresses returns a web page, and then you can input data or settings, and then other nodes/adresses are used for scripts to use a get to fetch that data.

AWS EC2 server, running Ubuntu 20.04.4 LTS
We are also using the AWS loadbalancer
Node JS 16.16.0
Node-RED 3.0.2
Yes, in Portainer 2.9.2

No, there is a "normal" ammount of debug messages, but once the error happens, there's nothing coming out of the http node.
No message to debug, no message to return to sender, no nothing.
And then you do the delete and recreate dance and it works again.

This may be key. Are you also running multiple instances of the same flows?

No, all the flows are different.
And all the flows have around 5 http in nodes, but some have many more.
The last node to fail was in a flow with about 100 http in nodes.

If the load balancer is key, then i dont understand why one specific node stops working, and replacing it with an new one set to the exactly same adress makes it work again.

Perhaps add a catch node with a debug node connected to it ?

Curious. I wonder if you are hitting resource issues?. I would personally not have more than a few and would use path and query params.

For example instead of having...

GET books/science
GET books/fiction
GET books/history
GET books/etc

I would have
GET books/:genre

Then use the :genre Param to internally route the messages.

Anyhow, i don't typically use docker but if I get a chance to set up a test environment I will let you know what I find

I'll try it, but will the catch node catch the error if the node is not outputting anything and is seemingly unresponsive?
Unfortunately in the process of debugging, the node is now responding again, so ill have to wait for another to freeze.

Is there a limit where you'd go "that's too many" ?

I have now confirmed the same issue on a different server with about 10 nodes.
Node ammount does not seem to be the issue.