Automate a huge volume of GET requests

I have a SQL Server table with 115,000+ last names and first name and some other identifiers. I need to make GET requests against a REST endpoint, using these data and store the json response for parsing.

Can I use NR to " Set the URL of a request using a template" swapping values in the template?
An example:
https://Cloud:8000/Patient?name=smith (patient 1)
https://Cloud:8000/Patient?name=jones (patient 2)
...

Patient name values are read from a SQL Server table. There are approx 115.000/year. This example returns only the PATIENT FHIR resource.

There are around 17 other resources; e.g. 18 X 115,000 = a lot of HTTP requests

Yes, of course. You could do it in a function node or change node or indeed a template node.

That said, dont you have access to the SQL server? You could grab all data using node-red-contrib-mssql-plus instead of making 18 X 115,000 HTTP requests?

Yes I do. My idea is now confirmed.

Need to take a look at three examples

Thanks steve.

Who I am looking for is in my database e.g. name = John Doe and birth date., along with a lot more names.

Need to dynamically build a GET REQUEST, to see if they are in a FHIR server. Record the return code, store the response; grab the next name and birthdate
On and on….

Not quite sure I understand where the name list originates - is it from SQL? If so, a single SQL request gets all details in an object/array. You can then walk through that and issue get commands for each entry. Add a slight delay between the get's though otherwise you risk flooding your FHIR server.

Personally, this is the kind of thing I would more likely do using a PowerShell script but Node-RED can certainly do it. PowerShell being better (somewhat) at streaming data from inputs to process and output and so may be more memory efficient.

The db table will contain demographic information about a live birth. Names and date of birth for example. The FHIR server will contain information from a hospital's EHR system (FHIR). Execute a
GET request like https://example.smilecdr.com:8000/Patient?name=smith&dateBirth=09202021 etc ...
in Postman or curl. You can use this to return the FHIR resource you want. I have a lot of names, and other predicates to search with.

So which drives the query, the db table or the FHIR system?

The db table contains birth information about the birth. The final record of records. When all the records in the db have been searched for I'm done. The birth hospital is in the db table along with other stuff about the kid. The endpoint to search then becomes the hospital where the kid was born. I can then issue a GET request and get a lot of other information about the birth that is not in the SQL server table.

So I guess the sql server table is driving the process - stop executing GET requests when I'm at the EOF

Uploaded a picture of a flow that I played around with in various sandboxes.

Right, so as I said. Issue a single SQL query to get all the records (or page that if the table is REALLY big).

Then for each record you got back, you need to issue a GET against the FHIR, if you get a hit, store whatever you need from the record and the returned FHIR data into a new variable.

At the end of your walk through the db records, you can delete that var and you are left with just the records that are in both the db and the patient system.

That will be the most efficient process from the DB perspective. Not terribly good for your patient system as you have to keep bombarding it with queries. Which is why you need a slight delay at the end of each loop through the db records.

Also not terribly efficient from a memory perspective since you have to load all of the db records into memory and that is why Node-RED is not as efficient at this type of processing than other platforms. One way around that is to grab the db records in batches (pages), process the batch and go round again for the next batch until you have no more db records.

That is also why something like PowerShell is more efficient because it will stream records and process each record as it is streamed. It is possible to do that using Node.js as well but Node-RED isn't really set up for that (there are a few streaming nodes I think so you may be able to do it, not sure).

Problem is I know less about PS than NR.

115,000 births/year. Hit all the Hospital EHRs, 1/min 31+hrs total. I did not factor in the RESPONSE processing, db inserts etc. would have to be polite and pause, else they might consider me a DOS attack vector.

1 Like

Without understanding more about the actual requirements of the system you are building, it would be hard to say much more (incidentaly I am an Enterprise Architect for the UK NHS so I know a little bit about processing patient data :grinning: ).

The approach you are taking is rather brute-force and I would expect that there would be much better approaches but much depends on why you are doing this and who owns the db and the EHR.

For the UK, patient identifiable information is treated as highly sensitive and is heavily controlled and so the approach you are taking is unlikely to be allowed unless both the db and the EHR are owned by the same organisation and the processing is approved by the Information Governance and privacy teams.

You would also need to be very cautious about the security of the processing system (e.g. Node-RED and its server platform).

Wow! Am I lucky that you stopped by.

Yes , I am well aware of the sensitive nature of the PHI. This project is funded by the CDC. The state of michigan owns the dB, and each hospital owns their EHR system. EPIC seems to be predominant. Layers of security oAuth2 expiring tokens etc. these are FHIR servers lots of json.

What we are after is all the medical information in the EHR in addition to the legal information necessary to complete a legal birth

Yes, very brute force, looking for an approach that has a lighter touch.

Well, we are straying a long-way off the topic of Node-RED :grinning:

But, the way we approach this in the UK is to get the hospitals to send the data to a central repository.

The repository will have other datasets in it as well that would allow matching of data sets for a patient. This is a lot more efficient since the data is always matched on the patient identifier at a database level.

Then the repository can either be queried directly - if patient identifiable data is needed.

Or the data is further processed to pseudonomise the identifiers and so reduce the sensitivity of the data. This is useful where the data will be used for national or regional statistics.

Kind of straying, but circling back the problem is:

I need to choose one of three NR approaches and send dynamically built GET requests to a FHIR endpoint, parse the response and dynamically load new info from a NR db to build out a new GET request. X 115,000 times

I think that I've outlined an approach. The number is fairly meaningless except that it is going to take some time to run.

Thank you