NodeRed stream processing with unordered messages

Hello Team,
I am evaluating NodeRed for the following scenario

Scenario

  1. We receive two message types from sensors
  2. One is about the scanning of the box
  3. Another one is scanning of the items to be put into the box
  4. The box is scanned first and then items, but order might be different because of the network connectivity
  5. The NodedRed(evaluating) should wait until all items and box messages are received(May be timed window of 15 minutes) and next box scanned messages is received
  6. Then merge the box message and item message together and send to a downstream processing engine

Non-functional requirement

  1. It should support 1000s of message per second
  2. It should support high availability
  3. We are planing to use NodeRed in the server side

I am wondering, is NodeRed is a valid solution for this scenario?Appreciate any help from you guys

Do you mean that item codes may be received before the related box? If so then how do you know which box the codes are with?

How are the scan messages read?

As to the main question, whether node-red is a valid solution, then certainly it is. There are many other valid solutions of course.
You would need to make sure that the hardware it runs on has sufficient capability to handle the message rate. The bottle neck would likely be getting the data into the server, which is dependent on the question above.

We are planing to use time window of 5 mins.
Each message has event time, so we need to collect the pallet and the related items into one message using this common stationId(Where the scan has been done) and using the 5 mins window

In 4 you say the box is scanned first - but then maybe a different order... so which is it ? How do you associate items with a box ?
You say 1000's messages per second - and then also say (in 5) a window of 15 minutes - does that mean there may be one box - then 15 x 60 x 1000 items before the next box ? = 900,000 items in a box ?

Edit: - or 300,000 for a 5 minute window.

Ah right, I don't think you mentioned that there were multiple scanning stations, all running in parallel presumably. However that just adds a bit to the algorithm used. So each message comes in with a station id and timestamp.

How do you know when a new box is started, if the first message is not guaranteed to be the box itself? Is it purely on time? If so then the requirement appears to be to collect all the messages from a station until the timeout expires and then pass on the aggregated data for processing. Not particularly difficult.

You didn't tell us how the data from the sensors come in, unless I missed it.

And re scalability... If there are multiple stations - could they each process their own data first - then just send a complete report per box ?

Sorry for missing that stationId info

Pallet scan message

msg = {
"stationId": "123456",
"PalletId": "pallet1",
"scanedTimestamp": 1590981718
}

Item scan message

msg = {
"stationId": "123456",
"ItemId": "item1",
"scanedTimestamp": 1590981718
}

There may be multiple stations(In reality there is multiple product line)

Since those are the data from scanners, and we don't have control over that, we need to assume the trigger will be

  1. When the next pallet scan message is received, then make the previous pallet message and items scanned message to be included in that previous pallet
  2. If we don't receive the next pallet message, then it has to be time bound, say 5 mins and then we assume that all items scanned will go into that pallet

Is a pallet the same as a box?

Adding the criterion that the pallet message will always be first makes it a lot easier, it looks like a fairly easy task now.

If you have control of the messages it would be a little simper if the topic were used to identify the message type (item or pallet) rather than having to interpret the data, but that is a minor issue. It can easily be worked out, but if the logic building the message did it that might be a bit more efficient.

The aggregated data message sent on could consist of the station id, the pallet id and timestamp, and and array of item ids and timestamps.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.