Weird behaviour when processing an array of arrays having null values

I am using node-red-contrib-spreadsheet-in to read an excel file into node-red.
This produces a json array of arrays having null values for those cells that are not populated.

E.g. here below the debug output of the sheet to json node for a simple excel sheet
So you can see that array[1][1] = null.

image

FYI the simple excel sheet used as input :
image

So far so good, but when I pass this through a change node with following JSONata expression:

payload.{
   "row" : $
}

then the "null" cell gets removed as you can see in below output:

image

... but when I test this with same payload (copy pasted from debug window) in the JSONata expression editor then I get the expected result with the null value:

As workaround - i have put 2 json nodes in between (to convert it to a string and back).

image

with this workaround I am getting the expected output:

I believe javascript uses sparse arrays where an element can be completely missing. I wonder whether your input array has the null element completely missing and after converting to JSON and back the element is present with a null value. Or vice versa.
I believe you can test that using for in to iterate the array. If the element is completely missing then it will not be selected in the loop, if it is present then it will.
If my guess is correct then whether jsonata should give different answers I don't know. Probably not. Are you using the latest node-red? I believe there was a jsonata upgrade a little time ago that fixed a number of bugs.

1 Like

Thanks for the response.

I am using node-red version v1.0.4

based on your input I have tested it with below simple java script program:

row0_for_of_count = 0;
row0_for_in_count = 0;
row1_for_of_count = 0;
row1_for_in_count = 0;

for (var i of msg.payload[0]){
  row0_for_of_count++;
}

for (i of msg.payload[1]){
  row1_for_of_count++;
}

for (i in msg.payload[0]){
  row0_for_in_count++;
}

for (i in msg.payload[1]){
  row1_for_in_count++;
}

msg.row0_for_of_count = row0_for_of_count;
msg.row0_for_in_count = row0_for_in_count;
msg.row1_for_of_count = row1_for_of_count;
msg.row1_for_in_count = row1_for_in_count;

return msg;

... and indeed the "for in" loop for the 2nd row only sees 2 properties.

... I have also applied the same script after passing the payload through 2 json nodes (to serialize it to a string and back) and the "for in" loop for the 2nd row sees now 3 properties:

image

So that most likely explains the root cause of the issue.

I am feeling rather pleased with myself now for that inspired guesswork (the weather is miserable here today so thanks for cheering me up :slight_smile: )
I don't know whether that is a bug in the jsonata code or not. It is certainly confusing. Are you using a recent version of node-red?

1 Like

node-red version 1.04:

I think this needs a jsonata expert.

I thought that person was you Colin :wink:

1 Like

Absolutely not. Once it gets above payload=0 ? true : false then I am lost.

3 Likes

Not sure if it is a JSONata issue.

It seems that the debug tab represents "sparse" arrays in exactly the same way as arrays with null values.
I also would expect that serialization and deserialisation with json node would return exactly the same payload but this is not the case when the array is a "sparse" array.

There are TWO questions then:

  1. Should jsonata behave differently for sparse and full arrays?
  2. Is the JSON node correctly converting a sparse array to JSON? With the corollary that if it is correctly converted then is it converting it back correctly?
1 Like

I have done some experiments with interesting results. The function below creates a sparse array with element index 1 set to null and index 3 set to 7. Using forEach() with node.warn finds only those elements and displays the values null and 7 as expected. Looping through using indexes 0 to length-1 then node.warn shows the empty elements as undefined and the null element is null. However, returning the array in the payload and displaying via a debug node shows the undefined ones as null, which is arguably a bug in the debug display. Perhaps @dceejay would like to comment on this.
The function node contains

array = []
array[1] = null
array[3] = 7
array.forEach( function(val, i) {node.warn(`each index: ${i}  value: ${val}`)})
for (i=0; i<array.length; i++) {
    node.warn(`for length index: ${i}  value: ${array[i]}`)
}
msg.payload = array
return msg;
[{"id":"ad3cb23a.59648","type":"inject","z":"bdd7be38.d3b55","name":"","props":[{"p":"payload"},{"p":"topic","vt":"str"}],"repeat":"","crontab":"","once":false,"onceDelay":0.1,"topic":"","payload":"","payloadType":"date","x":140,"y":480,"wires":[["168ec116.5dd1cf"]]},{"id":"168ec116.5dd1cf","type":"function","z":"bdd7be38.d3b55","name":"sparse array tests","func":"array = []\narray[1] = null\narray[3] = 7\narray.forEach( function(val, i) {node.warn(`each index: ${i}  value: ${val}`)})\nfor (i=0; i<array.length; i++) {\n    node.warn(`for length index: ${i}  value: ${array[i]}`)\n}\nmsg.payload = array\nreturn msg;","outputs":1,"noerr":0,"initialize":"","finalize":"","x":330,"y":480,"wires":[["261e7cc8.e77e8c"]]},{"id":"261e7cc8.e77e8c","type":"debug","z":"bdd7be38.d3b55","name":"","active":true,"tosidebar":true,"console":false,"tostatus":false,"complete":"false","statusVal":"","statusType":"auto","x":530,"y":480,"wires":[]}]
2 Likes

at first glance yes it looks like the debug display is not doing the right thing... we do handle a simple msg.payload = undefined so it should be possible to fix - I guess somewhere in the encodeObject function in util.js.

However more generally - JSON does not have a representation for undefined so JSON.stringify replaces undefined by null.

3 Likes

I think the underlying issue is that a JSON array requires all elements to be specified (at least as null) while javascript supports sparse arrays where not all elements are specified.

So not every javascript array can be unambiguously mapped to a JSON array.

3 Likes

Agreed, as @dceejay says, looking at the json spec it seems that a sparse array cannot be represented exactly in JSON. Null is allowed as the value for an array element but undefined or missing are not allowed. That means that converting a sparse array to json and back must fill in the missing elements with nulls.

That still leaves the question as to whether the effect that @janvda is seeing when he transforms the sparse array with jsonata is expected behaviour or not. Googling for jsonata sparse array seems to yield nothing that is obviously relevant, which is rather surprising to me.

1 Like

I think JSONata is designed to work only with JSON objects.

As I am using a sparse array (which is actually not a proper JSON object) you may say that this is not valid input for a JSONata expression.

... nevertheless the JSONata implementation is using javascript map and filter functions under the hood which also in javascript would skip the elements in a sparse array that are not specified. So the JSONata implementation appears to be consistent with that.

the other open question from your OP - is whether the spreadsheet node should represent blank cells as undefined or null or should use an empty string as it would do if it was csv

1 Like

True, but that does not mean that the implementation is performing according to the spec. An alternative implementation might convert into valid JSON first, so the result would be as you originally expected. Presumably there is a jsonata spec, but again it would need someone who knows jsonata well to understand it and see whether the implementation here is according to the spec.

I have raised following issue for that:

I guess that node GitHub - sakai-to/node-red-contrib-spreadsheet-in: Node-RED nodes to read data from spreadsheet (Excel, ODS, etc.) file is using a node.js module that is doing the conversion to a sparse array.

Using sparse arrays instead of undefined or null might not be a bad choice to safe memory space for those excel sheets having a lot of empty cells.

It would indeed make sense that debug window is not showing null for missing array elements or undefined array elements.

I don't believe that the use of an array of arrays is a good model for a spreadsheet anyway. If you need a proper full model for a spreadsheet, you need to keep the data in the spreadsheet. Or maybe read it into a SQL DB.

The standard model in JavaScript for such data is an array of objects which better models the spreadsheet schema. While sparse arrays certainly save memory, as you've seen, it doesn't allow you to retain a consistent model of the original data. You have to know the content of your data in order to ensure you don't use the wrong JavaScript function.

If using an array of objects, you no longer need to worry about sparseness since you can safely miss out any column and still process the data in a consistent way without needing to know the contents.

I would like to suggest that is a failing of the underlying library. Of course, also compounded by the other inconsistencies you've found on the JSONata/Node-RED side.

1 Like