Hi
We have a busy node-red running frequent updates to postgres
We are on Node-red version 3.02 with nodered-contrib-postgresql 14.2 on Redhat
We are on Amazon RDS Postgres with Postgres V14.12
Hardware-wise we have recently upgraded everything in size. Our postgres now has tonnes of free memory, CPU usage about 15%. Our Node-red is on RHEL, we are only using 20% of available memory and maybe 30% of CPU. Nothing should be overtaxed.
We have several processes running updates to the database and we believe they are efficient. According to the postgres logs, the checkpoints are not a big bottleneck (we think they were before but we upgraded hardware) and we have high efficiency SSD.
No matter what we do, we have good throughput, good performance for a majority of the time, and then without warning for maybe half a minute or sometimes a little more, Node-red and Postgres both begin to throw errors. On Node-red it looks like its connection requests to Postgres are longer than the timeout setting, and on Postgres it looks like the connection is being closed by the client.
We have set our idle and connection timeouts VERY long 5000 for idle (5s) and 10000 for connection timeout (10s). We had shorter connection timeouts before but we set them higher to see if we were being impatient trying to get a connection.
We have an out of the box configuration for the network connection between Amazon RDS and our EC2 instances and most of the time our throughput is fine and great. I cannot imagine this is a networking issue as it is intermittent.
NOde-red error
Error: Connection terminated due to connection timeout at /home/ec2-user/.node-red/node_modules/pg-pool/index.js:45:11 at runMicrotasks (<anonymous>) at processTicksAndRejections (node:internal/process/task_queues:96:5) at async PostgreSQLNode._inputCallback (/home/ec2-user/.node-red/node_modules/node-red-contrib-postgresql/postgresql.js:226:16)
Postgres Error
2024-11-25 12:00:39 UTC:172.31.95.156(51918):mypincentral@postgres:[28375]:LOG: could not receive data from client: Connection reset by peer
`
All the processes running at the time of the period of unresponsiveness throw connection reset errors on postgres and timeout errors on node-red.
We have configured error catch-recycle pathways on our flow which catches a timeout error, puts it into a wait rate-limit node to recycle, and feeds it back into the database. After the half minute or so of unresponsiveness mostly the issue resolves itself, the items then unqueue out of the rate limiter nodes slowly, and everything comes back to more or less normal.
We do not know why this happens and we have put in a lot of error handling for this, we would much rather avoid this issue happening in the first place than rely on the error handling.
Has anyone experienced and resolved the same issue and how did you resolve it?
Thanks
Richard