Batch InfluxDB CLI

Hi,

For the moment i'm using nodered for logging my data directly in InfluxDB but i want to buffer the data in batches (of 5000 measurements or more) and write those all together in InfluxDB.

Because if i use Grafana and i ask alot of data (for example 5years with a interval of 5s) then node red sometimes can't write data to InfluxDB. So with the buffer i can solve that problem.

Know i was thinking of writing the data in CLI and store it in a txt file (wich i already have done).
Example:
VATEN_1_P10_STEP, ActueelProd_S=Geen biersoort value=0 1599131750298000
VATEN_1_P20_STEP, ActueelProd_S=Geen biersoort value=0 1599131750298000

But does anyone have a idea of how i can send this txt file to the influxDb as a batch and check if the influxDB has recieved it?

Also i was thinking of using the Batch node in Node red for transmitting the data as batch but i thought it was too time consuming. Because i have to write the data to the external file read it back in and put all the data trough a function node to put it in a array structure. Or is that the better way of doing this?

I hope i explained myself well enough.

Thanks at advance!
Regards,
Ward

How does that problem show itself?

If we stress InfluxDB with grafana like mentioned above we see a gap in our graph like you can see in the picture.

And we get some error's in node red.

Rather than your suggested solution could you just queue up messages until the server recovers? Assuming that you can catch the error (you should be able to use a Catch node) then the solution in this thread might solve the problem without a major restructring of your system. How to queue / hold the sensor values ,when there is a internet failure occurs

As a matter of interest, why have you got multiple influx nodes?

Hi Colin

Because the first reason was of course for fixing the failure mentioned above. But after reading for the most efficient way to write you're data to the InfluxDB they all say that you best write it in batches of 5000 each time to reduce the workload.
And i just started to setup the system so it a test fase and i'm tryinfg to do it the best way possible.

And if in the future we want several node red pc that write to a single influxDB server so if the server is down for some time we want to buffer the data for a 3hours or so.

And for the multiple influx nodes. I was worrying if you got several inputs from different nodes at the same time that the Influx node coudn't handle it. But assuming your question i don't think it's a problem?

The first rule with optimisation is don't bother unless you have a problem. You have said that you have a problem when reading large amounts of data, locking up the server for so long that writes fail, but do you know that you are loading the server significantly when writing? Also note that if you write in big chunks then I guess that will delay your grafana screen updates while the write is going on, in fact if the write locks up the server long enough then the read may fail. In addition, your solution won't in itself solve the basic problem, you will still have to have logic that tries to do the write and if it fails then wait a bit and try again, which is what the flow in the link I posted does. So you could implement that bit first and see if it solves the problem. Then go the batching technique if necessary.

If you replicate the flow I suggested in multiple PCs each one will independently buffer its data, so again it should solve the problem, even if it is down for hours or days, so long as you have not got so much data that you run out of memory.

If you do decide to go the batching route then you could store the data in an array in persistent global context and then node-red will take care of saving the data on disc for you.

In your multiple influx nodes have you got multiple entries in the Server dropdown in the nodes? If so then you could be making the situation worse by trying to do multiple writes in parallel. If you only have one Server config then I think it will queue them up rather than running them in parallel, but I am not certain of that. Either way there is, as far as I know, no advantage in having multiple influx nodes.

First of all thanks for the quick reply's!

I have one Influx Server config. All the data is written to the same database. I think that is what you mean with Server config? But i'm going to change it to one node which make sense after what you said.

And i going to elaborate further on the catch node and the buffering in the array.

Again thanks for the reply's
Kind regards,
Ward

@Colin

After reading it for a second time. I think i do the thing you said.
I use instead of the InfluxDb out node the InfluxDB Batch node. Because i could automate the measurement name in a Function. And every single write is a different measurement in the same database.

The msg.payload and msg.topic will come from the siemens node so i have to enter the name of the measurement once. And i sent one message at a time.
image

So after your information i think this is a bad way of writing the data no?

A single value over 5 yrs at 5s intervals is 31,536,000 entries - By my calculation, unless you've purchased 8,213 Samsung LC49HG90DMUXEN 49" Curved Ultra Wide LED Monitors at >£800 each = £6,816,543.61, you will not be able to visualise all of that data.

The way to do this in InfluxDB is to use a continuous query to downsample data to more appropriate levels along with a retention policy that automatically trims your detailed data to a sensible timeframe.

For example, I have multiple environmental sensors recording at 1 minute intervals. These go into the detailed table and are retained for 7 days. A continuous query summarises that data automatically into my long-term table which keeps hourly data (including max, min and avg values) indefinitely.

Getting Grafana to do the summary means processing 31 million entries per value on each update which is not sensible and it is unsurprising that you are creating performance blips, especially if you were to try and run this on a server with limited resources (like a Pi or VPS). I would expect to see that you are getting spikes of paging as the indexes and data will be much larger than the available RAM.


One other point - depending on what data you are recording and why, is a 5s interval useful? I would only consider that interval for something that needed fine grained real-time control (maybe a boiler or tropical fish tank). For sure, if you are using cheap sensors like most of us do, some of them won't even be able to respond that quickly.

Maybe think about the practical difference in values you see between readings. If the % change is really small, there is probably no reason to take readings so often. If the differences are really noisy (change a lot between readings), don't assume that you need to read so often, check the specs for the sensors to see what the recommended reading interval is.

Hi TotallyInformation

The 5 year on 5s interval was just a stress test. And because we saw that InfluxDB wasn't reading anymore we wanted to fixed that problem.

But in normal circumstances we never ask that much data.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.