Trivial timeseries are an increasingly important topic - not just in PostgreSQL. Recently I gave a presentation @AGIT in Salzburg about timeseries and I demonstrated some super simple examples. The presentation was well received, so I decided to share this stuff in the form of a blog, so that more people can learn about window functions and SQL in general. A link to the video is available at the end of the post so that you can listen to the original material in German.
Table of Contents
To show how data can be loaded, I compiled a basic dataset which can be found on my website. Here is how it works:
1 2 3 4 5 6 7 8 9 10 11 12 |
test=# CREATE TABLE t_oil ( region text, country text, year int, production int, consumption int ); CREATE TABLE test=# COPY t_oil FROM PROGRAM 'curl /secret/oil_ext.txt'; COPY 644 |
The cool thing is that if you happen to be a superuser, you can easily load the data from the web directly. COPY FROM PROGRAM allows you to execute code on the server and pipe it directly to PostgreSQL, which is super simple. Keep in mind: that only works if you are a PostgreSQL superuser (for security reasons).
If you are dealing with timeseries, calculating the difference to the previous period is really important. Fortunately, SQL allows you to do that pretty easily. Here is how it works:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
test=# SELECT year, production, lag(production, 1) OVER (ORDER BY year) FROM t_oil WHERE country = 'USA' LIMIT 5; year | production | lag ------+------------+------- 1965 | 9014 | 1966 | 9579 | 9014 1967 | 10219 | 9579 1968 | 10600 | 10219 1969 | 10828 | 10600 (5 rows) |
The lag functions takes two parameters: The first column defines the column, which should be used in this case. The second parameter is optional. If you skip it, the expression will be equivalent to lag(production, 1). In my example, the lag column will be off by one. However, you can use any integer number to move data up or down, given the order defined in the OVER clause.
What we have so far is the value of the previous period. Let us calculate the difference next:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
test=# SELECT year, production, production - lag(production, 1) OVER (ORDER BY year) AS diff FROM t_oil WHERE country = 'USA' LIMIT 5; year | production | diff ------+------------+------ 1965 | 9014 | 1966 | 9579 | 565 1967 | 10219 | 640 1968 | 10600 | 381 1969 | 10828 | 228 (5 rows) |
That was easy. All we have to do is to take the current row and subtract the previous row.
Window functions are far more powerful than shown here, but maybe this example will help to get you started in the first place.
You may want to calculate the correlation between columns. PostgreSQL offers the “corr” function to do exactly that. The following listing shows a simple example:
1 2 3 4 5 6 7 8 9 10 11 |
test=# SELECT country, corr(production, consumption) FROM t_oil GROUP BY 1 ORDER BY 2 DESC NULLS LAST; country | corr ----------------------+-------------------- Mexico | 0.962790640608018 Canada | 0.932931452462893 Qatar | 0.925552359601189 United Arab Emirates | 0.882953285119214 Saudi Arabien | 0.642815458284221 |
As you can see, the correlation in Mexico and Canada are highest.
In the past we presented other examples related to timeseries and analysis in general. One of the most interesting posts is found here.
If you want to see the entire short presentation in German consider checking out the following video.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply