For anybody following this discussion: got access via a private message to the stream to test.
I tested it with following snippet:
var request = global.get('request');
debugger;
var node = this;
node.data = "";
request({
method: 'GET',
uri: 'some_private_url',
gzip: true
},
function (error, response, body) {
console.log('Most probably the stream has been ended ...')
}
)
.on('data', function(chunkAsBuffer) {
// Convert the (decompressed) buffer to an utf8 string
var chunkAsUtf8 = chunkAsBuffer.toString('utf8');
node.data = node.data.concat(chunkAsUtf8);
var firstIndex;
// Find the index where the next xml string starts inside the data.
while ((firstIndex = node.data.indexOf("<?xml")) > -1) {
// When the next xml doesn't start yet in the current data, the entire
// xml hasn't arrived yet. So wait until the next chunk arrives ...
if (firstIndex === node.data.lastIndexOf("<?xml")) {
break;
}
// Get the previous xml from the data
var previousXml = node.data.slice(0, firstIndex-1);
// Send the previous xml
node.send( {payload: previousXml });
// Remove the previous xml from the data
node.data = node.data.slice(firstIndex);
}
})
Using a combination of indexOf
and lastIndexOf
is not really good for performance, but seems that not much data is involved in this particular stream.
When testing the stream, it seems that there is a huge time interval between the xml's (i.e. only received one since I started my test...). Only the chunks of a single xml will arrive very quickly after each other. And I also get frequently (at periodic intervals) keep-alive heartbeats:
So those should be ignored anyway ...
But this means that there is a big problem with my code snippet: I only send an xml when the START OF THE NEXT XML is detected. But since that next url will only arrive much later, the previous xml will keep waiting to be send. Which results in unacceptable delays!
So it would be required to send an xml as an output message when the END OF THE CURRENT XML is detected. However when looking at an (56K long) xml, that is impossible (since the events
closing tag doesn't arrive):
<?xml version="1.0" encoding="UTF-8"?>
<events>
<batch ...>
<event .../>
</event>
</batch>
...
<batch ...>
<event ...>"
....
</event>
</batch>
<heartbeat timestamp="1599516241221"/>
<heartbeat timestamp="1599516301227"/>
<heartbeat timestamp="1599516361234"/>
<heartbeat timestamp="1599516421240"/>
...
So I "assume" the heartbeats also imply that the xml is completed...
But that is a wild guess...
But if my assumption is correct, it can be implemented by adding this (before the 'concat' line):
if (chunkAsUtf8.trim().startsWith("<heartbeat")) {
if (node.data !== "") {
// Send the previous xml, which has probably arrived completely
node.send( {payload: node.data});
node.data = "";
}
return;
}
Have tested this and seems to work. There is only a delay of the xml (between retrieval and sending it in an output message) of 1 heartbeat interval length.
I had hoped to be able to get a reusable component from this discussion, but it seems a VERY proprietary protocol