Mqtt out node buffering

how many one-wire messages can a mqtt out node buffer if the network connection is down?
What happens, if the buffer is full? It seems, that not only node-red is impacted but also the operating system of my Raspi. Only after power off and than reboot the mqtt-node starts working again when online.

could you explain this a little more?

  • Are the one-wire devices directly attached to the Pi?
  • where is the mqtt broker? on the same Pi? another? on the internet?
  • how ofter are you taking a reading?
  • how many devices are on the one-wire?
  • what model Pi?
  • what version of NR and node.js (you can fid these oon the NR start up log

I use a unipi 1.1 board connected directly to a raspi 4 4GB. This Raspi sends data from 6 one-wire temp sensors via wifi to a dedicated mqtt broker (Raspi 4 2Gb) which sends its data subcribed monitoring and managment system. All runs within a local network. Mesages per minute are between 10 an 20.
All run on the latest versions of node-red an Raspian.
On the unipi 1.1. are also 8 relais which I use to switch floor heating valves via http request from the heating management system (dedicated Raspi 4 4GB). Works error free.

Problem with my wifi connection is, that from time to time, the connection is lost for 3 to 8 min but the one wire message are send to the mqtt out node.

So the 1-wire devices are connected to the unipi 1.1 which is connect to Pi 'A' (rpi 4 4gb)
pi 'B' (rpi 4 2GB) is a dedicated mqtt broker

  • a 'monitoring and managment system' system subscribes to the topics being published by pi 'A'
  • a 'heating management system' pi 'C' (rpi 4 4GB) uses HTTP to talk to the unipi 1.1 (??)
    • is this done by talking to pi 'A' and having it talk to the unipi?

What QOS are you using with MQTT?
Why not put the MQTT broker on pi 'A'? it certianly has the horsepower to handle NR and mqtt at the same time.

Yes, you are right. This is my system.
pi 'C' also receives temperature data from pi 'A'.
Yes, I know that pi 'A' could handle the mqtt broker as well, but I want to have clearly dedicated systems.
I have another pi that handles only z-wave (e.g. thermostats) and another pi that handels only data from my remote solar panels (100km away) also sending data to the mqtt broker.

This all works perfectly. Only with pi 'A' I have the problems about every 2 weeks. I am polling my router every 1 min and therefore know, that there are sometimes connection problems. Short disconnection periods (3 min) no problem. Greater 6 min I have the mqtt problems.
I am now stopping publishing to mqtt when offline. Will see if this helps.

iWhen you say the network goes down, do you mean your router goes off or you saying your wifi has problems?

If it is the WiFi, are there motors or microwaves that might go on when you have the problem?

I know in our house, when the furnace kicks on, or someone uses the microwave, our WiFi slows down dramatically from the RF noise.

I don't know the answer to this as it depends upon how much storage the mqtt out node allocates to QOS 1 or 2 messages and the length of those messages.

I'd imagine that it would try and store as many messages as possible and eventually prob crash the system when it runs out, unless it's using some sort of database.

Tried following the rabbit hole of code but eventually gave up.

I'll see if I can do a pragmatic test

that's exactly what I think. It stores messages until it crashes. Therefore I am now stopping publishing when there is no wifi connection to the router and therefore to the mqtt broker. Will see if this helps to avoid crashing.

If you are just send temperature data, I can’t imagine the MQTT messages are large. Put a debug node on the output of what is feeding into the MQTT-out node and you can figure the size of each message

@juntiedt Johannes you also might want to check your mosquitto configuration for things like

max_queued_messages count

The maximum number of QoS 1 or 2 messages to hold in the queue (per client) above those messages that are currently in flight. Defaults to 100. Set to 0 for no maximum (not recommended). See also the queue_qos0_messages and max_queued_bytes options.

This option applies globally.

Reloaded on reload signal.

memory_limit limit

This option sets the maximum number of heap memory bytes that the broker will allocate, and hence sets a hard limit on memory use by the broker. Memory requests that exceed this value will be denied. The effect will vary depending on what has been denied. If an incoming message is being processed, then the message will be dropped and the publishing client will be disconnected. If an outgoing message is being sent, then the individual message will be dropped and the receiving client will be disconnected. Defaults to no limit.

This option is only available if memory tracking support is compiled in.

Reloaded on reload signal. Setting to a lower value and reloading will not result in memory being freed.


many thanks for your input.
It's actually not a problem of the mqtt broker. It is a problem with the publisher/mqtt out node on pi 'A'. This node seems to buffer messages when wifi is down until it crashes. Therefore I now stop publishing when the router could not be reached. Will see what happens.

It is not a normal problem that, I have had the situation with the broker not available, sometimes for weeks, and never seen it crash node-red.

Do you see anything unexpected in the node red log when you disconnect the broker? In fact a startup log might be interesting to see, from startup to disconnecting the broker.

actually I think node-red is not crashed. Just the communication with the mqtt broker is broken. Sudo reboot does not solve the problem. Only power off and then restart does fix the problem.
What is the underlying communication method of the mqtt node? Web-socket? Sending many messages to this interface without connection to the receiver seems to cause the problem.
As I said, I have changed my flow. I stop publishing when the router cannot be reached. I will let it run and observe the system. As soon as I have new results I will let you know.

It sounds like the real problem is with your wifi network. In a normal situation, you should not lose the connection like that. I would look in that direction first

I did a little test

Setup a Pi to send message out every 10 secs - viewed them arriving at the broker using MQTT explorer

Turned off WiFi access for it (I've got a managed WiFi network)

Came back now and re-enabled WiFi access to my LAN - I picked up quit a few messages from earlier - then a big gap - then current ones being sent

image

So my little test showed that just being discconnected from WiFi for a few hours didn't cause terminal crash

Also, it seemed that the possibly that Node-RED stopped trying to send messages (they were set to QOS2) eventually but restarted when connection came back

Now I wasn't trying to overload any buffers so I'll change the routine to 10/sec overnight , disconnect and then re-connect tomorrow morning

Bye for now :slight_smile:

If you are sure that is correct then there is a hardware problem. There must be something that is locking up that requires a power cycle to recover.

What is the symptom if you just reboot without power down?

If it happens again then look in /var/log/syslog and see what that says at the point it locked up. Post a chunk of the file here if necessary.

i did the same test and same result as you had.
Now I am concentrating on the middleware of unipi 1.1 (evok) which is connected via websocket to node-red and on the one wire sensors or better on the cabling. This can sometimes be tricky with one wire.

Hi Colin,
i agree with you. See my last post.
I will keep you updated.

For now, to all of you a big thank you for your support

Update:
I have updated to the newest PI OS and evok-middleware as well as node-red 1.1. Unfortunately no change to the better. Then I removed 4 of my 6 1W 18B20 temp sensors and the problem disappeared. No wifi problems, no 1W bus stops. Now I am adding the 4 sensors one by one in order to find the broken sensor or cable problem.
Colin was right!

Update:
I found one one-wire sensor which was broken. After changing the sensor I started again and had still the one-wire bus problem.
Evok middleware delivers senor data event driven via websocket. It was possible to exclude one-wire sensors from websocket. Now I am polling the one-wire sensors every minute. System runs stable until the wifi connection was disconnected for a few minutes. Again one-wire sensors had no connection and I got the same data on and on because they were read from 1W file system.
Next step was to stop publishing to mqtt when polling of the router failed. As soon as the router was online I started again publishing to mqtt.
Since than, the system runs without a problem!

As a side project I made a simple mqtt monitor:

[{"id":"8229b935.7206f8","type":"tab","label":"MQTT Monitor","disabled":false,"info":""},{"id":"7beb40c6.3de98","type":"mqtt in","z":"8229b935.7206f8","name":"","topic":"#","qos":"0","datatype":"auto","broker":"f5f6599b.7711a8","x":110,"y":260,"wires":[["444fa39c.90e15c","6f87fa1c.25df54","d17cc7f5.979a48","4cc7da4d.753b14","2602db1f.47e4d4"]]},{"id":"ff1d0abe.727f98","type":"ui_template","z":"8229b935.7206f8","group":"b3c4dc44.72872","name":"","order":0,"width":"20","height":"15","format":"<div style=\"margin: 0 0 0 -40px;\">\n  <ul>\n    <li ng-repeat=\"name in msg.payload track by $index\"\n        style=\"list-style-type: none;\"\n        >\n        {{name}}\n    </li>\n  </ul>\n</div>\n","storeOutMessages":true,"fwdInMessages":false,"templateScope":"local","x":1160,"y":260,"wires":[[]]},{"id":"c9b7429a.b8de","type":"function","z":"8229b935.7206f8","name":"","func":"msg.topic = flow.get('time') + \" --> \" + msg.topic;\nvar msg_arr = flow.get('msgarr');\nvar new_msg = \"\";\nif (msg_arr.length < 25){\n    msg.payload = msg.topic + '\\xa0\\xa0\\xa0' + \":\" + '\\xa0\\xa0\\xa0' + msg.payload;\n    msg_arr.push(msg.payload);\n    msg.payload = msg_arr;\n    flow.set('msgarr', msg_arr);\n}\nelse{\n    msg_arr.shift();\n    msg.payload = msg.topic + '\\xa0\\xa0\\xa0' + \":\" + '\\xa0\\xa0\\xa0' + msg.payload;    \n    msg_arr.push(msg.payload);\n    msg.payload = msg_arr;\n    flow.set('msgarr', msg_arr);\n}\nreturn msg;","outputs":1,"noerr":0,"initialize":"// Code added here will be run once\n// whenever the node is deployed.\nvar a = [];\nflow.set('msgarr',a);","finalize":"","x":1020,"y":260,"wires":[["ff1d0abe.727f98"]]},{"id":"c160ee74.cefc4","type":"moment","z":"8229b935.7206f8","name":"MM:DD:YYYY HH:mm:ss:SSS","topic":"","input":"xyz","inputType":"flow","inTz":"Europe/Berlin","adjAmount":0,"adjType":"days","adjDir":"add","format":"DD:MM:YYYY HH:mm:ss:SSS","locale":"de_DE","output":"time","outputType":"flow","outTz":"Europe/Berlin","x":810,"y":260,"wires":[["c9b7429a.b8de"]]},{"id":"94129b49.74f688","type":"switch","z":"8229b935.7206f8","name":"home","property":"topic","propertyType":"msg","rules":[{"t":"cont","v":"home","vt":"str"}],"checkall":"true","repair":false,"outputs":1,"x":510,"y":220,"wires":[["c160ee74.cefc4"]]},{"id":"c6976ac4.389878","type":"switch","z":"8229b935.7206f8","name":"shellies","property":"topic","propertyType":"msg","rules":[{"t":"cont","v":"shellies","vt":"str"}],"checkall":"true","repair":false,"outputs":1,"x":500,"y":260,"wires":[["c160ee74.cefc4"]]},{"id":"693d8a9.256cc74","type":"switch","z":"8229b935.7206f8","name":"Raspi","property":"topic","propertyType":"msg","rules":[{"t":"cont","v":"Raspi","vt":"str"}],"checkall":"true","repair":false,"outputs":1,"x":510,"y":340,"wires":[["c160ee74.cefc4"]]},{"id":"afa7a4df.9745b8","type":"switch","z":"8229b935.7206f8","name":"shellies/announce","property":"topic","propertyType":"msg","rules":[{"t":"cont","v":"shellies/announce","vt":"str"}],"checkall":"true","repair":false,"outputs":1,"x":470,"y":300,"wires":[["c160ee74.cefc4"]]},{"id":"2602db1f.47e4d4","type":"function","z":"8229b935.7206f8","name":"","func":"if (flow.get('mqtt_type') === 0) {\n return msg;\n}","outputs":1,"noerr":0,"x":290,"y":180,"wires":[["b3136bc4.dd7a48"]]},{"id":"444fa39c.90e15c","type":"function","z":"8229b935.7206f8","name":"","func":"if (flow.get('mqtt_type') === 1) {\n return msg;\n}","outputs":1,"noerr":0,"x":290,"y":220,"wires":[["94129b49.74f688"]]},{"id":"6f87fa1c.25df54","type":"function","z":"8229b935.7206f8","name":"","func":"if (flow.get('mqtt_type') === 2) {\n return msg;\n}","outputs":1,"noerr":0,"x":290,"y":260,"wires":[["c6976ac4.389878"]]},{"id":"d17cc7f5.979a48","type":"function","z":"8229b935.7206f8","name":"","func":"if (flow.get('mqtt_type') === 3) {\n return msg;\n}","outputs":1,"noerr":0,"x":290,"y":300,"wires":[["afa7a4df.9745b8"]]},{"id":"4cc7da4d.753b14","type":"function","z":"8229b935.7206f8","name":"","func":"if (flow.get('mqtt_type') === 4) {\n return msg;\n}","outputs":1,"noerr":0,"x":290,"y":340,"wires":[["693d8a9.256cc74"]]},{"id":"b3136bc4.dd7a48","type":"function","z":"8229b935.7206f8","name":"NoOp","func":"return msg;","outputs":1,"noerr":0,"x":510,"y":180,"wires":[["c160ee74.cefc4"]]},{"id":"69f1b72c.e10108","type":"inject","z":"8229b935.7206f8","name":"alle","repeat":"","crontab":"","once":true,"onceDelay":"1","topic":"","payload":"0","payloadType":"num","x":190,"y":100,"wires":[["ca395821.ed99e8"]]},{"id":"ca395821.ed99e8","type":"change","z":"8229b935.7206f8","name":"","rules":[{"t":"set","p":"mqtt_type","pt":"flow","to":"payload","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":400,"y":100,"wires":[[]]},{"id":"49bff3d5.a4da1c","type":"ui_button","z":"8229b935.7206f8","name":"","group":"3d5f625.a1c4b9e","order":19,"width":"4","height":"1","passthru":false,"label":"Alle","tooltip":"","color":"","bgcolor":"","icon":"","payload":"0","payloadType":"num","topic":"","x":110,"y":440,"wires":[["65ccf042.91427"]]},{"id":"7258f9e4.49ce88","type":"ui_button","z":"8229b935.7206f8","name":"","group":"3d5f625.a1c4b9e","order":20,"width":"4","height":"1","passthru":false,"label":"home","tooltip":"","color":"","bgcolor":"","icon":"","payload":"1","payloadType":"num","topic":"","x":110,"y":480,"wires":[["65ccf042.91427"]]},{"id":"24082040.ba67a","type":"ui_button","z":"8229b935.7206f8","name":"","group":"3d5f625.a1c4b9e","order":21,"width":"4","height":"1","passthru":false,"label":"shellies","tooltip":"","color":"","bgcolor":"","icon":"","payload":"2","payloadType":"num","topic":"","x":120,"y":520,"wires":[["65ccf042.91427"]]},{"id":"98312d9d.2b8d9","type":"ui_button","z":"8229b935.7206f8","name":"","group":"3d5f625.a1c4b9e","order":22,"width":"4","height":"1","passthru":false,"label":"announce","tooltip":"","color":"","bgcolor":"","icon":"","payload":"3","payloadType":"num","topic":"","x":120,"y":560,"wires":[["65ccf042.91427"]]},{"id":"dd0678c2.a15038","type":"ui_button","z":"8229b935.7206f8","name":"","group":"3d5f625.a1c4b9e","order":23,"width":"4","height":"1","passthru":false,"label":"Raspi","tooltip":"","color":"","bgcolor":"","icon":"","payload":"4","payloadType":"num","topic":"","x":110,"y":600,"wires":[["65ccf042.91427"]]},{"id":"65ccf042.91427","type":"change","z":"8229b935.7206f8","name":"","rules":[{"t":"set","p":"mqtt_type","pt":"flow","to":"payload","tot":"msg"}],"action":"","property":"","from":"","to":"","reg":false,"x":380,"y":520,"wires":[[]]},{"id":"f5f6599b.7711a8","type":"mqtt-broker","z":"","broker":"10.0.0.43","port":"1883","clientid":"","usetls":false,"compatmode":true,"keepalive":"60","cleansession":true,"birthTopic":"","birthQos":"0","birthPayload":"","willTopic":"","willQos":"0","willPayload":""},{"id":"b3c4dc44.72872","type":"ui_group","z":"","name":"Group 1","tab":"800f5253.4ba94","order":2,"disp":false,"width":"20","collapse":false},{"id":"3d5f625.a1c4b9e","type":"ui_group","z":"","name":"Group 2","tab":"800f5253.4ba94","order":1,"disp":false,"width":"20","collapse":false},{"id":"800f5253.4ba94","type":"ui_tab","z":"","name":"MQTT","icon":"dashboard","order":12,"disabled":false,"hidden":false}]