Grouping in Jsonata

Hi folks,

In my daily job I need to do some calculations on a json, so I thought that Jsonata might perhaps be the solution for my problem.

I have a (very long) json, but here is a small part of it:

{
  "indices" : {
    "cas-acc-2020.08.09" : {
      "uuid" : "sN1jvqrUTfmue5s-EXldoQ",
      "primaries" : {
        "store" : {
          "size_in_bytes" : 1584455
        }
      },
      "total" : {
        "store" : {
          "size_in_bytes" : 3097060
        }
      }
    },
    "mdm-acc-2020.03.19" : {
      "uuid" : "JUG3LKd9RAKSHpNq_46WTA",
      "primaries" : {
        "store" : {
          "size_in_bytes" : 539462
        }
      },
      "total" : {
        "store" : {
          "size_in_bytes" : 1134716
        }
      }
    },
    "cas-acc-2020.08.08" : {
      "uuid" : "ZQyH-_6HSSmZsfrZ81RZCw",
      "primaries" : {
        "store" : {
          "size_in_bytes" : 1250445
        }
      },
      "total" : {
        "store" : {
          "size_in_bytes" : 2319444
        }
      }
    },
    "mdm-acc-2020.03.17" : {
      "uuid" : "vC0WuIhXT9WKXLUfGPuSFg",
      "primaries" : {
        "store" : {
          "size_in_bytes" : 625743
        }
      },
      "total" : {
        "store" : {
          "size_in_bytes" : 1278747
        }
      }
    }

If I need to calculate the total size in bytes, I can do it like this:

$sum(indices.*.total.store.size_in_bytes)

However I would like to have the sum (of size in bytes) per group of items, based on wildcards:
For example I want to have the following two numbers in my output

  • total size of "cas-acc-*" = 5416504 (sum of 3097060 and 2319444)
  • total size of "mdm-acc-*" = 2413463 (sum of 1134716 and 1278747)

That way I can see the sum of total size of each group...

I only found this, but that is still far from what I want to achieve.

Does anybody know how to do this?

Thanks!
Bart

You could use the following expression (assuming it arrives as msg.payload)

{
"cas":$sum(payload.indices.$sift(function($v, $k) {$k ~> /^cas/}).*.total.store.size_in_bytes),
"mdm":$sum(payload.indices.$sift(function($v, $k) {$k ~> /^mdm/}).*.total.store.size_in_bytes)
}

Output:

{"payload":{"cas":5416504,"mdm":2413463}}
1 Like

Hey @bakman2,
Thanks. How on earth did you figure out those expressions...
I'm going to try these tomorrow morning at work.
You have become my personal lifesaver this week :wink:
Bart

jsonata is nuts yet extremely powerful :wink:

The documentation in general is bit rough, but sift is nicely described.

1 Like

I've seen plenty of nuts Jsonata - it does my head in. I only use it for straightforward transformations. I find anything fancy is way easier to understand later if it was written in a function node (maybe even with comments, god forbid).

1 Like

Yes to be honest I also use function nodes to do this kind of stuff (for my home automation), but now it is for my daily job and using jsonata there fitted more easily in the process flow ...

Bart, how long is long? I've found JSONata to be very very slow on large operations. Like 40s vs 1s in a function node on bigish data sets.

1 Like
  1. Comments like /* ... */ are also supported in jsonata.
    I think it often might help to add comments describing the output the jsonata query must generate.
  2. Even more interesting than comments is the Test tab in the change node which allows to specify an Example message input and check immediately the output. As the example message in the Test tab is not stored in the flow (and lost after a restart), I am often copying it as a comment in the jsonata expression:

6 Likes

Hey Steve, not long enough to become slow ... The query from @bakman2 is finished within one second and works fine to solve my problem at work.

1 Like

What I find with jsonata is there are good ways and bad ways for expressions, for simple things that are straight forward speed is most likely not an issue, when the dataset grows and the expression as well, it becomes more relevant to exactly know how jsonata performs its operation, which is a mystery to me, in those cases I seek help on the slack channel. I have seen really weird expressions, but applying an "incorrect" expression can add a lot of time.

I am sure there is a object/array limit where jsonata really starts to choke.

1 Like

Hey Bart,

Here are a couple more ways to use JSONata to get the output you need...

When I see a payload that is one big object with many objects stored under different property names (e.g. cas-acc-2020.08.09), I usually start with the $spread() operator, to split up the outer object into an array of objects. Then I know there is only 1 key per object, and can modify that to allow grouping (i.e. by prefix string cas-acc):

payload.indices.$spread() {
    $keys($).$replace(/-20.*/, ""): $sum($.*.total.store.size_in_bytes)
}

Another technique is to start with a list of all the keys, and look up each object starting at the top of the payload:

payload.indices.$keys() {
    $.$replace(/-20.*/, ""): $lookup($$.payload.indices, $).total.store.size_in_bytes ~> $sum()
}

Although this syntax is probably easier to "read", it will be much less efficient, due to all of the lookups.

The main advantage to this approach is that you do not need to enumerate all of the desired groups ahead of time -- in this case, the replace function will be run against any keys that are present in the data.

3 Likes

Hey Steve,
That is indeed an advantage, because I needed to do another query in advance to get all the available groups. Thanks!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.