Prevent outlier reading going to Influx

Dear node-red experts
I'm having a similar issue and I'm familiar with the range node, or average functions.
Short version:
I have single outliers in my measurement and I would like to get rid of them, replacing the outlier with an average of the last three measurements or something along that line.

Longer version with background:
I collect rain water from the roof into a large tank of about 28m3.
I use a modified sonoffth10 device with a HY-SRF05 ultrasonic sensor and some self written code which sends a JSON formatted string to the MQTT server and then to node-red. There it is stored into influxdb and displayed with grafana. Straightforward and simple.

Once in a while I have huge outliers in the measurements, see chart below:


These outliers can happen at a low water level or at a high water level. Therefore I can't use a range node. I do not like averaging because sometimes the outliers are quite extreme and would mess up the chart after such an event.

I'm looking for something to get rid of it, before it even goes into the influxdb.
I'm open for ideas.
(And yes, fixing it at the root of the problem would be the best way, but there are a number of reasons for these outliers, falling leaves, bugs or bats flying by, power issues, .... It seems easier to fix in software.)

Kind regards,
Urs.

It is poor etiquette to hijack another topic I have moved it to its own, to prevent confusing replies.

Have you looked at filter node? It can be set to block messages on a percentage change.
If you supply an array with a set of readings including an outlier, You will probably get far more examples. (09/13 to 09/14 looks like a good data set to play with).

Any one who wants more ideas can look at this topic to Smoothing a value and check for extrems

1 Like

You can take a look at one of the smoothing nodes, think this can get the job done before storing the data in influx.

My personal approach will be to take care of that the sensor is sending his data correct. It's a very slow changing signal so averaging with trimming is a nice approach.

1 Like

It doesn't feel right to me that you receive an unexplained reading so you replace it with a made-up value.
Surely better to save the bad reading in the database and exclude it from your database extract code?
Then you have the data available to correlate with other events.

I know that's not so simple (but possible) in SQL, and Influxdb's select query syntax was one reason I dumped it in favour of a more traditional RDBMS. So maybe it's not possible with Influx.

1 Like

While that might be good practice for a SQL DB, it isn't so useful for a timeseries DB. It would be better to send outliers to a separate DB if you need to monitor them.

For myself, I would send all my InfluxDB outputs via a single flow and at the front of the flow, add sanity checks on the inputs. For example, have a list of acceptable max/min values for a given topic. Also don't let any strings get into your fields and no numbers into your tags.

Why do you need to replace it with a guess? Why not just remove it?

2 Likes

Agreed, I missed that. Don't write a made up value, just don't write anything at all.

@E1cid : thanks for creating this new thread with my post. Fully agree.
@Colin : after thinking a bit (happens from time to time, sadly less often than needed) I will go for the removing option instead of replacing. The grafana chart will show the missing value since I prefer not to let grafana fill missing values with straight lines.
@edje11 : I'll have a look at the smoothing nodes, maybe they are helpful in other areas too.

Many thanks for the help here.
Kind regards,
Urs.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.