Efficient Database Updates

Hi,

I have a few hundred rows in Mongodb that I need extract on a scheduled basis, run each row through some functions and then update the database. I grabbed Mongodb3 connector because it allowed BulkWrite and UpdateMany since I'd prefer to handle the updates in batches.

I can't figure out the flow though - If I begin with Mongo foreach and pass through my functions I need something on the other side to batch them up for the db update. I tried batch/join but that appears to expect a constant flow like MQTT type messaging. There's no handling of the remainder if you end up with extra rows after the last full batch was sent.

Suggestions very much appreciated!

Just wanted to add: I just read about the RBE node. So maybe I can use a function to accumulate inputs into a json array which I output and then let RBE make the decision if my batch should be passed on. Would that be a reasonable way to accomplish this? I thought there might be something that handled this automatically, but at least this can be my Plan B.

Hi, some questions. Can't you do the calculations in the database itself? What sort of data do you need to manipulate?

Good question ;-). Yes, we could certainly do that, but we've created a web app to allow some customers to tweak some customizable "formulas". These guys are not super computer literate, but we're hopeful that if we start them out with some basic Javascript functions they'll be able to make modifications on their own. The formulas are not complex (just basic math), but they do require a good bit of changes over time. They do a lot of trial & error experiments and they already have an app with a bunch of fancy reports to show the results. We are just trying to supply a way to expose the formulas to them that isn't as dangerous as giving them a SQL lesson and db credentials.

It's just a POC - if we do a more complete app we may build our own formula engine.

As an ex DB administrator the thought of giving customers DB access makes me feel ill :slight_smile:
I would write the modifications to a new table and then run them against a COPY of the original data, to see if everything works and then write the modifications back to the original location. I would not to store data outside of the database at all, if possible, but use temporary tables instead.

3 Likes

I don't know about mongodb, but if you have your dataset in an array, if you feed it through a Split node to split it into individual rows, then feed them through your functions and on to a Join node in Automatic mode then (provided you have not lost the parts attribute of the message), the Join node will join them back into an array. However that may not be any use to you, I don't know.

@ghayne, No worries. They won't be writing back the input table - there will be separate input tables and output tables defined for their purposes. We MAY only expose subflows to them (if we can figure that out) and keep the actual mongodb connector to ourselves. Again, just a POC for something quick and dirty.

@Colin - currently I don't have an array. I am doing foreach in the Mongodb reader. In a similar product I used once there was a table node you could use as an intermediate storage but not seeing that here. RBE may work for me since it can block. I will look into that one.

If you can't figure that out, the other way would be to run user instances of Node-RED and get them to talk to a master instance.