I am doing a flow to store sensor data into a Sqlite database and require assistance at a concept level. The flow should read a CSV file and only insert a row if the data within the CSV file is unique. Uniqueness can be determined by searching within the db for an entry that looks the same as the data in the CSV file. Example: the CSV entry will have 3 measured values and a serial number for the source device that is unique. If any one of the three values in the CSV data is not the same as the last entry in the db for the same serial number then an entry should be made into the db.
The CSV file gets generated from an external application and will contain duplicate entries as it log a snapshot every 5 minutes.
Is the above method the best method to do it?
If it is, how can I query the db to search for the entry and if it fails, insert it. Note that the CSV file can easily have 100 entries in it.
Uniqueness can be determined by searching within the db
Uniqueness can be enforced by using unique constraints on the columns. Upon inserting the record will be ignored if the record is the same. Thus, no need to search for it (which could be heavy on db operations).
Note; when inserting, you will need to use the INSERT OR IGNORE INTO statement to avoid errors.
Thnx for the feedback bakman2. I did consider the UNIQUE keyword but it is not applicable in my situation.
I am creating a time series log.
Maybe I should elaborate more by giving an example. Let's say my CSV files looks as follows and this data gets parsed the first time round:
Sensor Number, value
1234,1
1235,9
1236.8
The data base should should store all three CSV entries.
If the above CSV data gets parsed again, none if it should be stored because nothing has changed
If let's say the data for sensor 1236 changes from 8 to 20 as in the following example, then only this single entry should be stored when the following CSV data gets parsed:
1234,1
1235,9
1236.20 <----- Changed value
Thnx for looking at it
Therefore the database should have the following entries:
1234,1
1235,9
1236.8
1236.20
Honestly, I think you may be using the wrong tool for the job and trying to force timeseries data into a relational db and therefore having to fight the tool rather than work with it.
Even on limited resource devices like the Pi, you can happily run InfluxDB which is a timeseries db. By simply feeding the data into InfluxDB, you can get it to do all the heavy lifting for you. It also has the ability to summarise data by time for you. For example, I keep per-minute data for 7 days and continuously summarise to hourly data (with avg, min, max points) and keep that for several years.
If you are trying to avoid saving successive records when a record arrives that is the same as the preceding record then handle that in node-red by comparing each record with with one that came before (by remembering the previous one) rather than fetching the previous record from the database just after you have written it.