It seems that the tcp request node does not always throw on error. I don't know how it's possible but if I use only a catch node to catch its errors, it mostly works but from time to time it fails. The workaround I found was to add a status node to watch the tcp request status and manually throw an error from the subflow. This works reliably but I think the catch node should be enough alone although it isn't. Does anyone have any hint on the possible pitfalls of catching errors?
Under what failure mode does it not throw the error?
It's either the common.status.disconnected or the common.status.error. Since I set up the status to watch these two (I don't know if there are any other) and I throw in these cases manually, there's no problem.
The strange thing is, that since I throw manually as well, the errors are also thrown by the tcp node as well. I remember turning off the status node just because of this as I did not want to catch the same error twice but then it stopped working. So I re-enabled the status node and I throw a different error manually to be able to differentiate after the catch and handle the error only once.
By the way, this is on 1.2.9.
Are you saying that an error is thrown if you are monitoring the status, but is not thrown if you are not monitoring? If so then that is a bug that should be reported.
Yepp, that's what I'm saying but it's like this:
-if I'm monitoring the status the errors are always thrown
-if I'm NOT monitoring the status, the errors are not always thrown (or just not caught by catch?)
I tried to log the behaviour but as I can only log an error that's caught, that log definitely does not help. Is there anything I can help with when submitting an issue? Otherwise it is just an unreproducible mystic issue
If you are not monitoring, how do you know the error is happening?
Can you share your flow.
It may be something silly.
Well, the show does not go on then We are controlling fixtures (lamps) via DALI network to which we connect via tcp which is opened by a so called dali server. In this case, we are reading the power consumption of the dali drivers and log the values. In case the error is not caught, the flow gets blocked and no more measurements are done which can be seen in the db log table that no measurements arrive. The problem is actually that sometimes the dali server crashes and we expected to catch that (to restart the dali server process) since the tcp node should throw an error. However, it only works reliably if the tcp node is monitored by a status node.
Here is a picture of the subflow monitoring the tcp nodes (I don't think sharing the code itself would help further):
Yepp, the picture of the flow is shared in my answer to Colin.
If you dont want further help then no, you dont need to share the flow code.
I dont believe there is a bug here, i suspect it is your arrangement of nodes, being in a subflow and the fact your server "crashes" that are the issue. You should probably also the possibility of no reply.
It is for sure that the problem is that the dali server crashes but we want to be able to restart the process even if that happens. (We are also working on fixing the crash but no one guarantees that we don't hit another bug later.) So when we can't connect to the server, the tcp should throw an error which can be caught and after that the dali server process can be restarted. Actually, all this works but only with the help of the status node as I mentioned earlier.
Is this "Dali server" something you/your team have created?
If so, then you might want to consider using a standard protocol like MQTT or create a HTTP API. Far easier to handle outages.
As for "the tcp should throw an error" that is not strictly true. Some things are not considered an error.
As i said, your flow should watch for timeout - that is the true "catch all"
Ok thanks! Then our assumption was wrong that the tcp node always throws an error. The dali server is an open source project ( GitHub - onitake/daliserver: Command multiplexing server for the Tridonic DALI USB adapter ) which we forked but ours is currently still the same as the original. Once we manage to fix the bug making it crash, it will be upstreamed to the original as well.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.