Connection lost - dashboard (and editor)

I had an issue earlier this week that you all helped solve (Thank you!). I have another, not sure if anyone has advice how to solve it.
Seemingly randomly, the "connection lost" error will show up on the dashboard, it freezes up, and if I refresh, it is unable to reconnect, the browser just spins indefinitely. This occurs on all browsers. The node red editor locks up too.
Interestingly, node red still appears to be running in the background, and nothing shows up in the log. In addition, the only way to solve the problem is to restart the device. Stopping and starting node-red does not solve the problem. I am still able to interact with the device over ssh, so it seems it must be specific to port 1880? I do not know enough about networking to troubleshoot this.

The device running the node red instance is a raspberry pi 4 with raspbian. I am connecting remotely using zerotier, and my colleague who is on the same network is also having the same issue. We are using the dashboard as a gui to run an executable and to display data on the dashboard. New data gets sent every 10s, and some of the data is loaded in from a file, while the rest is sent using mqtt. The gui also sends messages to the program using mqtt as well.

Does anyone have suggestions on how to troubleshoot this issue?

Sounds like you may have loop, possibly in your node-red flow but more likely MQTT.

You may need to start node red in safe mode to find and fix the problem.

Can you elaborate? It starts up fine and randomly loses the connection at some point. When I lose connection, node red still shows it is running in the background, and nothing shows in the log.

What can I do when I start it in safe mode that will help to solve this problem?

First it might be worth running htop on while NR is running, there may be something obvious using a lot of CPU or memory resources.

Then try disabling a few tabs at a time to narrow it down (if you can't access the editor at all start NR in safe mode. I think deploy will start the flows running.)

When it next locks up, stop your mqtt broker (mosquitto?). If node red recovers after a while then your have a loop involving mqtt.

Are you sure it stops? If it is stuck in a loop it can take minutes to stop.

ok, good to note, I will give that a try, as well as stopping the mqtt broker.

OK, it lost connection again, and I stopped mosquitto. I waited 10 min, but nothing changed. After running node-red-stop, if I wait several minutes as you suggested, and then start it back up, I am able to connect again. Also, this is a screenshot from htop, showing node-red taking up 100% of the cpu (after I had stopped it).

This is node red after starting back up again, shows it using 105% cpu. Is that normal? Also, this is just before it lost connection again.

No it's not exactly normal.
I created a loop with mqtt - mqtt-in sending the same data into mqtt-out on a Pi 4, this is htop:

So Node-red jumps to 100% (of a single CPU core) and mosquitto takes 20% of another.
With that though, sudo systemctl stop mosquitto immediately restores calm.

So I presume that whatever is causing your problem it is entirely within Node-red.
All I can think of is disabling flows one by one to narrow down where the issue is.

I have a scenario where node-red is subscribed to a topic that it another flow is publishing. Would this be a potential issue?

No that should be perfectly fine. A problem can arise where you subscribe to a topic using a wilcard, do something with it and then publish to a topic which the wildcard catches.

eg subscribe to reactor/# (wildcard) , use the value, publish to reactor/control/temperature.

But since it doesn't go away when you disable Mosquitto, I don't think your problem is a loop in MQTT.

the author said ... " With that though, sudo systemctl stop mosquitto immediately restores calm." so I read that as stopping mosquitto DOES stop the flood - so yes I still suspect an MQTT loop ....

I think you misread @dceejay.

I said that's what happens when I deliberately create an mqtt loop.

@aams said

1 Like

Thank you all for your help. I had a very subtle bug in a function I was using to set the range on a plot based on incoming values. It was under certain conditions, causing an infinite loop inside the function and locking up the browser. I was unaware that node-red wasn't stopping right away, so I didn't think to troubleshoot my flows. A note for anyone in a similar situation, if you select a group of nodes and press ctrl-shift-p, you can mass disable nodes, which makes for a fairly quick method to isolate the problem.

That is a good typo, it takes a bit of looking to see the problem.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.