Modbus Serial - Whole bus disconnects when one device drops

I have 7 Modbus room thermostats, 1 relay pack and 1 electricity meter on the bus currently. All works good until one device in the bus loses power. Each of the other devices go to "reconnect" state and they are not able to get back up until the one device is powered on again.

I am not very familiar with the deep understanding of the bus but I assume that this is not the way it should function. In my intuition only the one node in the system would go to "reconnect" state and the others would stay functional.

Any ideas what could be wrong here?

Hello and welcome to the forum.

I don't see what your issue has to do with Node-Red per say, probably more something in the wiring or power distribution.

What exactly are you using to connect everything... what is the master (and is it the one loosing power?) and perhaps you could supply more details and show your Node-Red specifics, like flow etc.

However, and this is just an intuitive guess as I am still learning about all these protocols, based on the little clue in your title... serial... I suspect that you might be using RS-485?? and the communication passthrough may require each device to be functional... think old school Christmas lights wired in series.

Alternatively it may be that the non-powered device is loading down the serial bus and preventing the data transfer. If you disconnect that device do the rest recover?

Thank you for the answer!

I don't see what your issue has to do with Node-Red per say, probably more something in the wiring or power distribution.
I have double checked the wiring and one by one tested each device on the bus that they work as a single.
What I tried to say with until one device in the bus loses power is that I am trying to create system which is able to stay functional even though one device in the system break down / loses power. For simulating this I cut the power from one device myself on purpose and see what happens.

What exactly are you using to connect everything... what is the master (and is it the one loosing power?) and perhaps you could supply more details and show your Node-Red specifics, like flow etc.

I am using pair cable. I am using Moxa usb to modbus adapter. So I think the rpi through that is the master?

For troubleshooting I use this bit simplified flow The "Modbus Read" node is reading the values from the relay:

However, and this is just an intuitive guess as I am still learning about all these protocols, based on the little clue in your title... serial ... I suspect that you might be using RS-485?? and the communication passthrough may require each device to be functional... think old school Christmas lights wired in series.

Yes it is RS485. I was thinking that but somehow it doesnt make sense that the whole bus dies if one device dies in the bus. They are connected in a way that the connection to each device still stays the same even though one device is not operational. So there is no "loss" of connection on this situation. +

If I disconnect the device physically the rest dont recover. Only way is to disable or remove that Node from my flow. This is when the bus gets back up. This is why I have been thinking that somehow the modbus node is not able to function unless every device is operational and working.

Do you mean usb to serial? Or is it a specialist modbus device?

Did you read the first sentence of the help text for the modbus read node?
" If you have more than 10 nodes on one communication configuration, use the Modbus-Flex-Getter or think about multiple connections to your Modbus device, please!"
You appear to have 43 nodes on it. Or am I misinterpreting something?

Ah sorry. Yes I mean USB to Serial converter. UPORT 1130 is the model if I remember correctly.

I did some testing today and found out that by using the older package called node-red-contrib-serial-modbus, the bus works correctly. By using that package I was able to keep polling other devices while turning off one device from the bus and the others kept working. However sadly this node-package is not maintained anymore and dont have the proper features for my need.

Inspired by this I deleted all node-red-contrib-modbus nodes from my flows and even deleted the whole package from the node-red palette, rebooted and then reinstalled the package again.

I made the following flow and tried again (these are the only modbus related nodes in my system):
image

These are two different devices from the bus and they work fine (get messages in few second intervals). Until I cut the power from the other one. It causes a lot of stutter to the other one or whole communication break. Seems like if the node is trying to ask something from a "dead" device, the whole bus gets somehow broken/too full/super delayed.

Any ideas on what should I check ?

Could you export that flow here please?

See this post for details of how to do that if you don't know - How to share code or flow json

Have you asked the developer on the Git Hub site? If anyone knows, it would be the one(s) who made it.

I suspect I know what the problem is, but I need to see the flow.

How is your termination on the Bus setup ?

As you know RS485 is setup as a Serial connection topology where a single pair of cables goes from each device to the next - some devices have an IN and an OUT port - others you just wire the cables between them

The most important thing is that to operate correclty the bus must be terminated at both ends (and only at both ends - remove all intermediate termination) typically they are 120ohm resistors for the termination.

The other thing is that every device on the bus must have a unique ID - are you sure that you do not have and duplicate IDs ?

Craig

Here is the code I been testing with now

[{"id":"bdc7638d.9337f","type":"debug","z":"54bc74df.bb994c","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1390,"y":400,"wires":[]},{"id":"87186745.9fbb98","type":"modbus-flex-getter","z":"54bc74df.bb994c","name":"","showStatusActivities":false,"showErrors":false,"logIOActivities":false,"server":"b500f1a.e0fdb1","useIOFile":false,"ioFile":"","useIOForPayload":false,"emptyMsgOnFail":false,"keepMsgProperties":false,"x":1210,"y":400,"wires":[["bdc7638d.9337f"],[]]},{"id":"ee6cad7b.ba46f","type":"function","z":"54bc74df.bb994c","name":"","func":"msg.payload = { value: msg.payload, 'fc': 3, 'unitid': 14, 'address': 0 , 'quantity': 7 }\nreturn msg","outputs":1,"noerr":0,"initialize":"","finalize":"","x":1000,"y":400,"wires":[["87186745.9fbb98"]]},{"id":"96f51396.8f0c8","type":"inject","z":"54bc74df.bb994c","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"3","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":850,"y":400,"wires":[["ee6cad7b.ba46f"]]},{"id":"6118c7c8.c8e818","type":"debug","z":"54bc74df.bb994c","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":1390,"y":300,"wires":[]},{"id":"200861a4.300a8e","type":"modbus-flex-getter","z":"54bc74df.bb994c","name":"","showStatusActivities":false,"showErrors":false,"logIOActivities":false,"server":"b500f1a.e0fdb1","useIOFile":false,"ioFile":"","useIOForPayload":false,"emptyMsgOnFail":false,"keepMsgProperties":false,"x":1210,"y":300,"wires":[["6118c7c8.c8e818"],[]]},{"id":"cf64791d.1a9b68","type":"function","z":"54bc74df.bb994c","name":"","func":"msg.payload = { value: msg.payload, 'fc': 3, 'unitid': 10, 'address': 0 , 'quantity': 4 }\nreturn msg","outputs":1,"noerr":0,"initialize":"","finalize":"","x":1020,"y":300,"wires":[["200861a4.300a8e"]]},{"id":"eded1805.150158","type":"inject","z":"54bc74df.bb994c","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":840,"y":300,"wires":[["cf64791d.1a9b68"]]},{"id":"b500f1a.e0fdb1","type":"modbus-client","z":"","name":"MOXA","clienttype":"simpleser","bufferCommands":true,"stateLogEnabled":false,"queueLogEnabled":false,"tcpHost":"127.0.0.1","tcpPort":"502","tcpType":"DEFAULT","serialPort":"/dev/ttyUSB0","serialType":"RTU-BUFFERD","serialBaudrate":"9600","serialDatabits":"8","serialStopbits":"1","serialParity":"none","serialConnectionDelay":"100","unit_id":"","commandDelay":"1","clientTimeout":"1000","reconnectOnTimeout":true,"reconnectTimeout":"2000","parallelUnitIdsAllowed":true}]

There is a certain amount of guesswork in the following, but I am sure it is close to describing what is happening.
The first thing to remember is that all of the comms is happening down one pair of wires and that only one device can be polled at a time. All the comms passes through the one modbus-client config node.
In normal operation one of the Flex Getter nodes receives a request, passes this on to the config node, which sends the request to the device and waits for a reply, when it gets the reply it passes it back to the Getter node which sends it on to the rest of the flow. Whilst that is happening the other Getter node might receive a request. This will be queued till the first transaction is complete and then the new one will be handled, and so on.
However, if one of the devices is offline then when the config node sends its request there is no reply. You have specified a 1 second timeout so after 1 second it will timeout. You have specified Reconnect on timeout, so the next thing that happens is that it will attempt to reconnect. You have specified a 2 seconds timeout on this so the node will wait for two seconds before giving up. Note that this has all have taken three seconds, during which time no further comms has been able to proceed, so the whole bus has been held up for three seconds. During this time further requests may well have been received and so probably the other device will then get a go, which should be successful. Then if the first one is polled again it will again hold the bus up for three seconds. The reason all the nodes show reconnecting is that the config node is reconnecting so all devices using that node show the status.

I suggest that you look at how quickly your devices respond and you may well be able to reduce the timeouts to 100 or even 50 milliseconds, which should make the whole thing run much more smoothly. In fact I suspect you don't need the reconnect on timeout setting at all, so try de-selecting that. To further improve it you should add some logic that notices when a device is offline and slows the polling on that device down to a slow rate, maybe once a minute for example, so that it does not affect the other devices.

1 Like

Thanks for the help! This seemed to be my problem.

I changed the Timeout (ms) to 100ms and untick the Reconnect on timeout. Now it seems to be able to keep the bus on even tho one device dies. And I think I also now understood what went wrong there.

I have one more question regarding this. What should be the best/proper way to poll the devices? Clearly if I poll each of my 9 devices with the flex-getter like in the picture (with 1 second inject interval). The answers will start to stutter and messages come in very randomly. It works correctly but its not the action I want. I would like to have even flow of messages from the devices.

image

Why will they?

Your best bet is to wire in series.

Couldnt he use the

From @BartButenaers - to just handle a round robin type scheduling ??

Craig

well he could - but wiring in serial is somewhat simpler (imho)

Im not sure, is it something to do with the messages polling at the same time ? I just found it very unbalaced if I put sesveral inject nodes with the same interval (1s). Seems like if I mix up the interval little bit between the nodes, it gets smoother but somehow I dont think that is the best idea.

This looks good. Howevery I started to think if the msg go through the Flex Getter if it dont get respond? Just worried in the series-setup that one device will block the whole series.