Can node-red process a large http (json) response as a stream?

For using node-red for combining/transforming multiple json microservice apis, some of whose responses can be very large in the 10s or occasionally even 100s of megabytes:

Is there any way to not have the http request node have to read the entire response before sending it on to the next node?

i.e. is there some way it can submit the request and then start streaming in the response and (here I'm assuming the response is a json array) split the array elements sending those individually to the next node, while the response is still streaming in?

I'm not sure if it helps but googling in the node.js world I often see references to the "JSONStream" package as a popular one for this kind of thing...

Thanks,

Darren

That is not possible. Do the microservices you are "talking" to have other API connection options (other than REST), like websockets, kafka, MQ etc ? These are made for streaming responses. Or does the REST API have options/parameters to offset the data ?

As bakman2 says, that isn't possible with the core http-request node. However, it should be possible if you wanted to get your hands dirty.

You would need to import some suitable libraries that would allow the request along with a streamed response with the stream creating chunked output messages.

Have you waded through the 6 pages of things that mention "stream" on Flows? Library - Node-RED

Alternatively, a roll-your-own approach might use something like node-fetch - npm (npmjs.com) imported to Node-RED.

Thanks @bakman2

And no these are just normal REST apis though yeah some (not all) do have pagination options for asking for a specified number of records at an offset.

For those supporting pagination I guess a flow could hit the api repeatedly and split the array that comes back for each page to achieve something similar to what I was thinking?

Though after processing the array elements I will need to join the results back together in the end for my final result.

I should have said I've played a bit with node-red a couple years ago I'm not a total beginner but no expert btw. But I do node.js programming for my day job so I could write function tasks or even a custom node if need be.

We process these json apis I'm referring to using node.js code already but we are having performance challenges there too. That code doesn't use a streaming/dataflow style if I could achieve what I'd described in node-red I'd hoped to see how it's performance compared to our code (though I wish node-red also had some kind of web worker/multi-threaded support it still doesn't right).

fwiw here's one of the articles that made me think theoretically what I described should be possible even in node-red (though I haven't tried maybe I'm missing something):

Anyway thanks again.

Indeed, it is the best and fastest option.

fwiw here's one of the articles that made me think theoretically what I described should be possible even in node-red (though I haven't tried maybe I'm missing something)

This is just a dirty workaround, writing/reading while the data is coming in. It is kind of weird to have a RESTful API that delivers such huge loads of data, it hurts the performance of the server too i can imagine.

Thanks @TotallyInformation I was writing my reply to @bakman2 then saw yours

I'm pretty new/rusty on node-red (I really only played with it a couple years ago) and no I didn't know to look at those "stream" matches in flows

But I will that's a gr8 idea thanks

No problem. For those with pagination - which is typical for things like Graph and OData API's, the presence of the paging link is usually sufficient and you can simply put in a link back to the input of your request with the new URL substituted.

There are now many API's that are capable of returning huge datasets. I was playing with the Microsoft Graph API just a few days ago, our Azure Active Directory is "only" less than a 100,000 entries but obviously that is still a whacking great set of data. But not that unusual to need to fetch all of the user entries to do analysis. The paging is the only thing that allows this to be feasible since the back-end processing is also paged so that the servers aren't overwhelmed.

Just remember that you might need to put in a delay node to prevent you from spamming the server otherwise you might get banned. A few seconds is normally more than enough.

That is what I was referring to. An API request is: server work, client (a)waits for complete response. A single request returning 100s of megabytes is not normal, nor common.

If you look at the Microsoft Graph documentation for example, rate limits, throttling, paging are important topics when accessing the/an API. If paging is available: use it.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.