A new chapter of weird MQTT messages

A long time back I was having all sorts of problems with MQTT.

Wrong machine messages being sent when shutting down, rogue messages turning devices on/off at weird times, and probably others.

A lot of them went away when I replaced the power supplies with bigger ones.
Others went away when I updated two machines from very small ones to RasPi 3B+.

Things were looking good.

But a gremlin has come back.

I have an arduino connected to my WiFi and it sends temperature readings back to the network and also displays the time now and then.

It has a Birth Certificate, and a Death Certificate... aka LWT.

Every now and then I receive a LWT from the board. Fair enough I guess.
But what is weird is that I never get a Birth Certificate shortly after.
Yet: The device is connected to the network.

It is there sending temperature readings and displaying the time every now and then - as it should.

So I am curious to any thoughts to what is causing these messages to be sent.

I'm not seeing a WAP failure, which I guess could/would cause the LWT to be sent.
So I'm guessing it isn't that.
(Also other WiFi devices aren't being dropped.)

Thanks in advance.

Andrew,

Can you access the raw MQTT log files so you can see the stream of messages rather than the NR interpretation.

Presumably you are running one of the packaged distros on this unit (what version ?)

The log will live in /var/log/mosquitto

I usually log into the System as user PI (no elevated privilieges) then perform a
sudo su
to have full access to all the log files

You should then be able to cat the file out and page through it to see if there is any more detail before and after the disconnect

cat mosquitto.log | more

Hi Craig.

Um, ok, to answer your question/s:
RasPi Buster.
NR 2.2.2
Mosquito... Stuck.
ver? version? Neither seem to work.

But skipping that for a second:
The broker is run on the RasPi (mentioned above)
but looking where you said, I see nothing.

Or at least: on mosquitto log.
Ok, sorry... in the mosquitto directory. Look below.

The version:

pi@TimePi:/var/log $ mosquitto -h
mosquitto version 1.5.7

mosquitto is an MQTT v3.1.1 broker.

Usage: mosquitto [-c config_file] [-d] [-h] [-p port]

 -c : specify the broker config file.
 -d : put the broker into the background after starting.
 -h : display this help.
 -p : start the broker listening on the specified port.
      Not recommended in conjunction with the -c option.
 -v : verbose mode - enable all logging types. This overrides
      any logging options given in the config file.

See http://mosquitto.org/ for more information.

pi@TimePi:/var/log $ 

I don't know if that is an old or new version, but it was all built about a month or two ago from a new build/flash/machine (everything)

Looking at the log:
(Great I can't parse the timestamps.)

Alas the log is now in the .gz format. I don't know how to get into it at this point in time.
I'll do some digging.

Back soon.

Ok, after digging I found this:

0.155 is the tail of the IP address.

cat mosquitto.log.2 | grep 0.155
1649615710: New connection from 192.168.0.155 on port 1883.
1649615710: New client connected from 192.168.0.155 as 192.168.0.99 (c1, k60).
1649621392: New connection from 192.168.0.155 on port 1883.
1649621392: New client connected from 192.168.0.155 as 192.168.0.99 (c1, k60).
1649634374: New connection from 192.168.0.155 on port 1883.
1649634374: New client connected from 192.168.0.155 as 192.168.0.99 (c1, k60).
grep: (standard input): binary file matches

But I don't know how to convert the timestamps to English.

Ok, progress.
Those are not in the correct time frame.

These may be.

cat mosquitto.log.1 | grep 0.155
1649697835: New connection from 192.168.0.155 on port 1883.
1649697835: New client connected from 192.168.0.155 as 192.168.0.99 (c1, k60).
1649729172: New connection from 192.168.0.155 on port 1883.
1649729172: New client connected from 192.168.0.155 as 192.168.0.99 (c1, k60).
1649738797: New connection from 192.168.0.155 on port 1883.
1649738797: New client connected from 192.168.0.155 as 192.168.0.99 (c1, k60).
1649766881: New connection from 192.168.0.155 on port 1883.
1649766881: New client connected from 192.168.0.155 as 192.168.0.99 (c1, k60).

Oh, before we go further, here is what I am seeing causing the concern:

Screenshot from 2022-04-13 19-21-40

I am not finding these messages in the MQTT log files.

These are the log files I have/see:

ls -l
total 1228
-rw------- 1 mosquitto mosquitto 680235 Apr 13 19:23 mosquitto.log
-rw------- 1 mosquitto mosquitto  79534 Apr 13 00:00 mosquitto.log.1.gz
-rw------- 1 mosquitto mosquitto  75295 Apr 12 00:00 mosquitto.log.2.gz
-rw------- 1 mosquitto mosquitto  78934 Apr 11 00:00 mosquitto.log.3.gz
-rw------- 1 mosquitto mosquitto  79332 Apr 10 00:00 mosquitto.log.4.gz
-rw------- 1 mosquitto mosquitto  79301 Apr  9 00:00 mosquitto.log.5.gz
-rw------- 1 mosquitto mosquitto  79498 Apr  8 00:00 mosquitto.log.6.gz
-rw------- 1 mosquitto mosquitto  79758 Apr  7 00:00 mosquitto.log.7.gz

I've looking in 1, 2 and 3.

(Back on the NUC - big machine)
This is what I am seeing:

me@me-desktop:~/temp$ ls -l mosquitto.log.?
-rw------- 1 me me 842953 Apr 13 19:18 mosquitto.log.1
-rw------- 1 me me 792957 Apr 13 19:13 mosquitto.log.2
-rw------- 1 me me 841932 Apr 13 19:13 mosquitto.log.3
me@me-desktop:~/temp$ cat mosquitto.log.? | grep 0.155
1649697835: New connection from 192.168.0.155 on port 1883.
1649697835: New client connected from 192.168.0.155 as 192.168.0.99 (c1, k60).
1649729172: New connection from 192.168.0.155 on port 1883.
1649729172: New client connected from 192.168.0.155 as 192.168.0.99 (c1, k60).
1649738797: New connection from 192.168.0.155 on port 1883.
1649738797: New client connected from 192.168.0.155 as 192.168.0.99 (c1, k60).
1649766881: New connection from 192.168.0.155 on port 1883.
1649766881: New client connected from 192.168.0.155 as 192.168.0.99 (c1, k60).
1649615710: New connection from 192.168.0.155 on port 1883.
1649615710: New client connected from 192.168.0.155 as 192.168.0.99 (c1, k60).
1649621392: New connection from 192.168.0.155 on port 1883.
1649621392: New client connected from 192.168.0.155 as 192.168.0.99 (c1, k60).
1649634374: New connection from 192.168.0.155 on port 1883.
1649634374: New client connected from 192.168.0.155 as 192.168.0.99 (c1, k60).
grep: (standard input): binary file matches
me@me-desktop:~/temp$ 

The dates of the files are today's as they were copied here today/now.
But I search for the IP address and get the lower part.

But those numbers are not of the date in question.
They are April 13.

From the RasPi directory these are the dates shown.

pi@TimePi:/var/log/mosquitto $ ls -l
total 1228
-rw------- 1 mosquitto mosquitto 680235 Apr 13 19:23 mosquitto.log
-rw------- 1 mosquitto mosquitto  79534 Apr 13 00:00 mosquitto.log.1.gz
-rw------- 1 mosquitto mosquitto  75295 Apr 12 00:00 mosquitto.log.2.gz
-rw------- 1 mosquitto mosquitto  78934 Apr 11 00:00 mosquitto.log.3.gz
-rw------- 1 mosquitto mosquitto  79332 Apr 10 00:00 mosquitto.log.4.gz
-rw------- 1 mosquitto mosquitto  79301 Apr  9 00:00 mosquitto.log.5.gz
-rw------- 1 mosquitto mosquitto  79498 Apr  8 00:00 mosquitto.log.6.gz
-rw------- 1 mosquitto mosquitto  79758 Apr  7 00:00 mosquitto.log.7.gz
pi@TimePi:/var/log/mosquitto $ 

So if I am looking for April 12 22:xx:xx time.... Which would you pick?

Yep the log files (usually) rollover at Midnight each night and are then zipped up to save space - it appears you are using the standard anacron stuff which does a weekly cycle.

So if you want to see the logs for a specific day (not the current day) then you look at the file whose date is +1 - so in your case mosquitto.log.1.gz

the Gz extension tells you this is a zip file to save space

so you could use

cat mosquitto.log.1.gz | gunzip | grep 155 (as an example)

Craig

Yeah, thanks.

I did try that - well nearly.
But I put a wild card in and so it looked at all the unzipped files.

Nothing came up, which is weird.

Small update:

I added a bit more stuff to the flow to capture those specific messages.

Got these 3 messages captured:

{"topic":"EOM","qos":0,"retain":false,"_msgid":"197602a9879c5432","type":"EOM","_event":"node:6cb87e96588abaed","origPayload":"GPS comms FAILURE","settings":{"input":"2022-04-12T02:06:12.585Z","input_format":"","input_tz":"Australia/Sydney","output_format":"YYYY-MM-DD HH:mm:ss","output_locale":"en_AU","output_tz":"Australia/Sydney"},"time":"2022-04-12 12:06:12","payload":"GPS comms FAILURE"}
{"topic":"EOM","qos":0,"retain":false,"_msgid":"b3f05b922a4ab30e","type":"EOM","_event":"node:6cb87e96588abaed","origPayload":"GPS comms FAILURE","settings":{"input":"2022-04-12T04:46:37.164Z","input_format":"","input_tz":"Australia/Sydney","output_format":"YYYY-MM-DD HH:mm:ss","output_locale":"en_AU","output_tz":"Australia/Sydney"},"time":"2022-04-12 14:46:37","payload":"GPS comms FAILURE"}
{"topic":"EOM","qos":0,"retain":false,"_msgid":"f9ca9ba43f8888a7","type":"EOM","_event":"node:6cb87e96588abaed","origPayload":"GPS comms FAILURE","settings":{"input":"2022-04-12T12:34:30.923Z","input_format":"","input_tz":"Australia/Sydney","output_format":"YYYY-MM-DD HH:mm:ss","output_locale":"en_AU","output_tz":"Australia/Sydney"},"time":"2022-04-12 22:34:30","payload":"GPS comms FAILURE"}

So the EOM message are being detected. (Comms failure) but I am not seeing when it is re-established.

:confused:

I'll power cycle the unit and see what is shown.

Ok, curiouser and curiouser. Unplugged power, waited a while, plugged back in.

ONLY got the comms failure message. No Comms up message.

Time to check the Arduino code.

Can you check the Broker with MQTT explorer to ensure that the LWT messages are set correctly ? (or what they are at the moment

Craig

Seems I goofed.

I didn't set a Birth Certificate in the arduino's code.
So it was sending a LWT (of reasons still unknown) but when it reconnected it wasn't sending a Birth certificate because I hadn't written that part of the code.

Now that part is working, I need to see if/when it drops out/off again and if I get the correct signal that it has come back to life.

(Sorry)

For the sake of not leaving this as open and I admit I didn't add the birth certificate in the arduino code, it is probably a mistake from something I did/not do.

Until I have a better understanding of what it is, I'd better close this and ask again when I know more about the problem.

Always remember that the device doesn't send a LWT when a disconnection occurs.

The broker notices that it hasn't received any messages (or an MQTT ping request) from the client within the keepalive time (usually 60 secs unless specified) and then it publishes the LWT on the topic specified.

I think you have not quite understood the messages.

I was seeing the LWT and was confused why I wasn't seeing the BS.

I forgot that I hadn't written the code in the Arduino to send it. So it couldn't be sent.

Now I have done that, I am seeing the BS.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.