Complexity of join, simplify flow?

This is a slightly different topic from me. I got something that works. But it feels a bit complex, so looking for possible simplifications. Been experimenting with joins lately to combine multiple msg to a single object and send 1 http message (instead of several small ones).


This example is quite typical for collecting data and making statistics. It largely consists of 3 parts, regardless of what the data is (protocol, source), what you want to do with it (processing) or where to send it (http, file, db):

  1. read data (I/O input)
  2. process data
  3. write data (I/O output)

First of all, when dealing with joins, I have to make sure not to lose any messages. Luckily I already have a robust custom subflow to handle modbus requests, making sure it always produce output (even if modbux flex getter originally can swallow messages silently). Then, in case something goes wrong, I send a reset message to join before processing the next batch. Because join must have all messages, I can't discard failed messages mid-stream. That's what the 2 lines at the bottom are for (keep failed messages and reset signal).

Then there is a prepare join function node to put everything I need from individual messages into the payload. Can be as simple as cloning the entire message into it's own payload. Lastly, there is the prepare write which cleans up the join and format the message according to the output requirements.

I know some may think parts/join is simple. I'm not sure I agree. By all means, it may be easy when you have all data in a small example without I/O. The difference between join and no join in this case is example is 2 additional lines (failed messages and rset) going around the main portion of the flow. A join node (which is ok, no escaping that), a prepare join node and some additional complexities in prepare read. Perhaps I'm overthinking and this is as simple as it gets?

What are the blue nodes?

[Edit] Also, are you doing lots of individual modbus queries, one for each value, and what database are you using?

1 Like

The blue ones are buffer parsers. Yes, lots of different modbus queries for different ranges of values. Database is in Thingsboard cloud, think it's a mix of cassandra & postgres. But we are not affected by that in any way, as we simply use the http rest api to send data.

If is generally best not to try to join data from multiple reads when saving in a database. Use different measurements (or tables, or keys, depending on the database) for each set of data fetched. In this case that means that each successful modbus read will result in a record in the database. That will dramatically simplify your flow, and you will not end up with a database with holes in it for missing data.

1 Like

The databases are entirely transparent to us. What is done here is send a single http POST request to thingsboard cloud. They have limits on API usage, theoretically it is 50 msg/s, don't think we ever came close to that. But I have gotten 429 (too many messages) on one occasion. Sending a single message would also reduce network usage, which is on metered connection. Regardless, I don't see any difference in thingsboard whether we send individual messages or grouped together like it is done here.

Last time I worked with statistics using mysql, we used data warehouse-like structure with large fact tables. Each individual datapoint was a column and each row had a shared timestamp. But it doesn't look like that structure is used in thingsboard. I think each data point is handled individually even if sent together.

Finally, while it's not so important to join here, some times I have to work on output of such data as shown here to do further calculations (totals of multiple sources). And then it is a lot easier to work with data nicely grouped together in a single object instead of tying together all the pieces afterwards.