Oh great! Now my MQTT is not working!

If you delete the unwanted one then it is easier to see the ones that need fixing as they will have the red triangle against them.

Double click on each one in turn to open its edit dialog and click the delete button.

The story continues.

Got home, turned on (this) machine.

Logged in.

Down to 3 using the "wrong" node. Shall have to find them.

Deleted the rest.

But, in the mean time, the MQTT is working nicely.

Me and my big mouth.....
All done. Deleted (from the list) the un-wanted MQTT node and went around editing the ones with errors.

Set them to the right one.

DEPLOY.

MQTT down.

Well, it is disconnected and blinks connected every few seconds for a short while then reverts to disconnected.

Looking at the MQTT log, this is at the bottom:

1549508892: New client connected from 192.168.0.99 as mqtt_a36ec996.601568 (c1, k60).
1549508892: Socket error on client mqtt_a36ec996.601568, disconnecting.
1549508907: New connection from 192.168.0.99 on port 1883.
1549508907: New client connected from 192.168.0.99 as mqtt_a36ec996.601568 (c1, k60).
1549508907: Socket error on client mqtt_a36ec996.601568, disconnecting.
1549508922: New connection from 192.168.0.99 on port 1883.
1549508922: New client connected from 192.168.0.99 as mqtt_a36ec996.601568 (c1, k60).
1549508922: Socket error on client mqtt_a36ec996.601568, disconnecting.
1549508937: New connection from 192.168.0.99 on port 1883.

So "who" is this mqtt_a36ec996.601568?
A tab? A flow? A node?

I know it keeps changing every time I go through the process, but it must mean something.

I'll keep digging.

(And something else.....)
This machine's NR.
All MQTT nodes connected.

That is the mqtt config node in node-red.
It seems possible you have a network issue of some sort. On the failing machine do what I suggested before, in a terminal run
ping <ip address of host machine>
and leave it running until the connections start failing. Do you see anything interesting? Does it hang? Do the times get much longer?
is the ip address of the machine running mosquitto obviously.

I think there is some miss-understanding on the machines.

The mqtt config node. Ok, that is on timepi.
The failing machine is the one which is running the mqtt server/host.

Weirdly enough the machine (timepi) pings all the machines at regular intervals - like every 6 or 10 seconds.

The load on the machine is running at about 50-80% with peaks to 100% now and then when it is pinging.

What do you mean by the last line:

I've (in the mean time) gone though and edited all the MQTT nodes again and got rid of all the unused nodes and found a few still using the wrong name.

So, other than the time for the pings....... But they will/may vary depending on the machine's load (timepi) sending the pings.
And yes, it pings itself.

It is to determine if the machine is online or not.

I think we are all confused now.

  1. What is the ip address of the mqtt server machine
  2. What is the ip address of the other machine?
  3. Are you running node-red on both machines?

In answer to your questions:
1 - 192.168.0.99
2 - This machine (I'm confused with "other") is .146
3 - Yes.

And as anticipation:
This machine's MQTT nodes are ok.
Some of the nodes on .99 are ok. Some aren't.
Nodes on another RPI running NR are also connected. That IP address is .93

Can you clarify that part about:
is the ip address of the machine running mosquitto obviously.

I assumed that you were using mosquitto as your mqtt host. Is that not the case? But all along I have been confused about which machine is which...
I understand now some more of what you have been saying.
The errors you have been getting in the log are things like

1549508922: New client connected from 192.168.0.99 as mqtt_a36ec996.601568 (c1, k60).
1549508922: Socket error on client mqtt_a36ec996.601568, disconnecting.

That is saying that the mqtt config node a36ec996.601568 in 192.168.0.99 is connecting and then failing.
So that means you have to look at the mqtt config node(s) on the mqtt server machine.
So see if you have a similar situation there with repeated config nodes. I would expect the config node on the mqtt server machine to be configured to use localhost rather than a specific ip address. If you only have one then show us how it is connected. If you have more than one then delete the extra ones as you have on the other machine.

Ok, in reply.

I have only one MQTT broker/host/machine.

The other day all/most of the MQTT nodes went disconnected and flashed connecting, then briefly connected before returning to disconnected.

That was on the same machine as the broker/host

I was looking to resolve this a36ec996.601568 on that machine but couldn't.

Meanwhile......

I sat down and tidied up all the unused nodes and rationalised all the MQTT nodes which I had mistakenly stuffed up earlier on.

Nick had told me to give it a name rather than the IP address.

So, after editing abut 90 nodes, they are all now happy (er) than before.

Alas there were still problems.

Some of the nodes were disconnected. Some connected. (On THAT machine.)
I looked at the MQTT nodes on THIS machine and they were all connected.
I also powered up another RPI. All its MQTT nodes were happy.

I sat down and disabled tabs trying to work out where the problem is.

This took hours.

I didn't really get anything helpful.

So that is where I am at.

just now everyone is happy.

That isn't to say in 3 minutes they will not spit the dummy again and stop working.

Well all the mqtt log errors that you have been posted have been connection problems on 0.99

Perhaps in future when you have problems like this it would be best only to use the ip addresses when posting here, so we will not be confused about the multiple machines.

Thinking about the mqtt errors in the log, you will get reconnection each time you deploy the config nodes of course. Perhaps that is what you have been seeing there.

However you say everything is ok now, so lets hope it stays that way.

Me too.

I was only mentioning other machines because to me it is strange that:
On 0.99 (the broker/host) some/all the MQTT nodes are not working.

Yet, on another machine (say 0.146) the MQTT nodes are working, when it is using the same broker/host as 0.99.

Then!
When some of 0.99 nodes are connected and some aren't..... That really throws a spanner in the works.

The last time you reported you had some nodes connected and some not (a few weeks ago), our repeated advice was to get all of your mqtt nodes using the same mqtt-broker config node - that way they share the same connection. The fewer connections you have, the fewer things to go wrong.

Well, as of not long ago from now, they all are.

I have about 90+ MQTT nodes on that machine.

So, all good for now.