Can I ask your advice on how to store data in my JSON based database efficiently (PostgreSQL).
I have a number of Arduino modules that send various data at various intervals, these need to be stored.
Each Arduino module is different in it’s IP number, example IP01 .. IP02 .. IP03 etc.
The way I have done this in the past, is to have one database storing all data from all modules in one big table. Then when I want to display the data for a graph, I first select the IP then get the data for that IP only.
So this is my question do I do as before or have a table for each IP and the retrieve data.
So in this example app inventor would have table for IP01 and IP02 And IP03.
Which will be faster to access and would either be significantly smaller in disk space used.
Generally, queries against a single table will be faster than multi-table queries. So that is your first decision point - how do you most commonly use the data.
Against that, you may need to consider the sizes of your table indexes. The more complex the data, the more likely you are to need mutli-column indexes for efficient use but these all take up live memory.
So there is always a balance that only you can decide since we don't know how big and complex your data becomes over time and we don't know how much memory your server has and we don't know the exact use of your data.
One other note. That data appears to be time-series data and I have to say that, again speaking generally, a dedicated timeseries database such as InfluxDB will pretty much always be more efficient at handling that kind of data for writing, reading and processing.
My needs are not huge, but at most 4 pieces of data are drawn on a graph using Grafana.
Electric & Gas meter readings are update on every change of usage through out the day. So I will guess at 1000 rows of data.
Each Module IP is accessed in the database to draw the Grafana graph ( so 4 individual IP Arduino modules ), so I would do a select from each module in the database to make this graph.
Hopefully this will give an overview of my upgraded system.
The PC running this has 8Gb ram running Kubuntu with 80GB HDD and a reasonably powerful CPU.
An example of my current system chart which is slow to update due to running on RPI and a very slow disk.
The data will be kept for a minimum of 1 year, possible longer as I want to compare energy usage.
If i understand timespan correctly, hard to say as I mentioned I am comparing data from 1 year against another year and then month or week.
You can store the complete json object including the "IPxx" into a single table, sqlite would work as well. Make sure to use the jsonb datatype (not json) in postgres, it will be stored/indexed as binary. If you expect 1000 rows per day, 365k rows per year, nothing crazy and extremely fast. A dedicated timedatabase may be more efficient over longer time.
Hi
Yes i have been using SQLite for a year but gradually its slowing down, so that's why I chose a fresh more modern PC and a better database. I have just watched a video before reading your reply and it was about using jsonb instead of json
Dont forget also that a dedicated TimeSeries Database (Influx being the default - but there are others) will also give you more options down the track to automatically downsample the data to maintain a sane size whilst still providing overviews of the data
I did originally try Influx, but couldn't get it working ( I just did not understand how to get it working ).
Then I tried PostgreSQL and that worked from within Node-RED, so I used that with grafana.
My aim is to have graphs in grafana, and I think that's where I had the problem, something to do with bucket i seem to remember?
Any ideas, as at the moment I am still in test mode with my data and grafana so I can change if I know how.
sqlite with json should be fast as well - but when querying the data, make sure to use function expressions (ie. functions that are specifically for json, like json_extract etc). I read that the most recent version of sqlite (since january 2024) also supports jsonb for binary blobs, which makes it even faster.
influxdb 2.0 lost its way, because of the flux language (bucket things), they will be re-introducing sql back in 3.0, not sure what they were thinking. timeseries db's have a major advantage that they can much better aggregate data over timeframes. But still a million records or something should not be a problem at all for sqlite.
It has auto-trim and auto-aggregate features that let you move from detailed data to long-term data. For example, I keep 1min data for 7d. It is automatically aggregated to hourly data (with max/min & avg values) which is kept for a few years. This was working fine on a Pi3.
It integrates well with Grafana, indeed, the easiest way to build InfluxDB queries is via Grafana which gives an easy point-and-click interface. To bakman2's point though, that uses the older format of queries. Influx were going to deprecate those but I believe that they've walked back from that now after the outcry? So just make sure you use the older style queries rather than new.
I think that there is a post in the forum FAQ's category where I outlined the terminology and advisable IoT data structures for InfluxDB a few years ago.
I had settled on PostgreSQL, now you mention that SQLite has JSONB and is also fast !
Also TickTock DB looks interesting.
I also quickly looked at InfuxDB 3 again but it seems to be in transition, at the moment it's called
InfuxDB edge then it will become InfuxDB community.
Bakman2 here are my functions I use in grafana when I was using SQLite, is this correct ?
select json_extract ( DATA,'$.central_heating_set_to' ) as central_heating_set_to,
json_extract ( DATA,'$.iso') as time
from Sensors
where DEVICE = 'IP12'
AND json_extract ( DATA,'$.central_heating_set_to' ) >= 0 ;
It seems everyone has there preference not sure which way to go ?.
Stick with PostgreSQL as I have that working ?
Grafana function I use with PostgreSQL.
SELECT (energy.device_data->>'meter_reading')::float AS Meter,
(energy.device_data->>'timestamp')::int AS Time,
(energy.device_data->>'iso')::text AS ISO
FROM energy WHERE device = 'IP22';
I'm doing something similar monitoring freezers. I use MySQL (mariadb) and Chart.js.
Your four day graph appears to be what the data will be mostly used for?
With some once per year/month review for cost research?
So optimize for the most used application, i.e. the graph.
Setup your index to get quick response for what you use it for most of the time.
i.e. DateTime of recorded value, and device.
In my case a sproc queries the most recent data and writes the average reading by device into a temp table. I summarize into 10 minute periods. That takes about 0.26 second.
I return the x axis (time), and the y axis values for each device shown on the graph. That adds about 0.26 sec per device. The mySql node-red node returns an array with the result of each query as an array element. It's trivial to push it into the graph.
I added a periodically called sproc to delete data that has aged off, and update the table statistics.