Using Parquet file instead of csv?

Flux will get there I'm sure, its just a bit early. v1.8 can use both if you like so you can use it where it makes sense and ignore elsewhere.

Do yourself another favour and read up on continuous queries and retention policies.

A common mistake to make is letting InfluxDB continue to accumulate detailed data for ever. Eventually you will find it slowing your machine down or even crashing. Avoid this by using a continuous query to aggregate data for long term storage along with a retention policy that automatically trims the detail data keeping it to a manageable size.

For example, I may keep all of my environmental sensor data a 1min intervals for a month. But a continuous query aggregates all of the detail to hourly avg/max/min data points which I keep for 5 years. (*)

Similarly, I use Telegraf to collect system data into another db. That is a lot of data since most of it is taken at 15-30s intervals and there are hundreds of items. So I only keep that for a week.

That way, everything is kept manageable and tidy without any effort after the initial setup.

Terminology can also be confusing for InfluxDB beginners - it is helpful to know:

  • An InfluxDB measurement is similar to an SQL database table.
  • InfluxDB tags are like indexed columns in an SQL database.
  • InfluxDB fields are like unindexed columns in an SQL database.
  • InfluxDB points are similar to SQL rows.

This thread has more details: Need more detailed information on influxdb - General - Node-RED Forum (nodered.org)


(*) Incidentally, you may be wondering about performance. I once had all of the sensor data kept for about 3 years on a Raspberry Pi 2 - It was about that time that I began to see performance issues on the Pi :grinning:

Bigger devices should have no problems with millions of entries.

BTW, if you do need to eventually go full enterprise mode. Note that the version of InfluxDB that fully supports clustering does cost money.

3 Likes