After being on the road to do PostgreSQL consulting for CYBERTEC for over a decade I noticed that there are a couple of ways to kill indexing entirely. One of the most favored ways is to apply functions or expressions on the column people want to filter on. It is a sure way to kill indexing entirely.
Table of Contents
To help people facing this problem I decided to compile a small blog post addressing this issue. Maybe it is helpful to some of you out there.
To prove my point here is some sample data:
1 2 3 4 5 |
test=# CREATE TABLE t_time AS SELECT * FROM generate_series('2014-01-01', '2014-05-01', '1 second'::interval) AS t; SELECT 10364401 |
The script generates a list of around 10 million rows. A list lasting from January 2014 to May 1st, 2014 is generated (one entry per second). Note that generate_series will create a list of timestamps here. To find data quickly an index will be useful:
1 2 3 4 5 |
test=# CREATE INDEX idx_t ON t_time (t); CREATE INDEX test=# ANALYZE; ANALYZE |
Under normal conditions the index shows its value as soon as a proper WHERE clause is included in our statement:
1 2 3 4 5 6 7 8 9 10 11 12 |
test=# explain SELECT * FROM t_time WHERE t >= '2014-03-03' AND t < '2014-03-04'; QUERY PLAN ----------------------------------------------------------------------------------- Index Only Scan using idx_t on t_time (cost=0.43..3074.74 rows=86765 width=8) Index Cond: ((t >= '2014-03-03 00:00:00+01'::timestamp with time zone) AND (t < '2014-03-04 00:00:00+01'::timestamp with time zone)) Planning time: 0.083 ms (3 rows) |
However, in many cases people tend to transform the left side of the comparison when searching for data. In case of large data set this has disastrous consequences and performance will suffer badly. Here is what happens:
1 2 3 4 5 6 7 8 9 10 11 |
test=# explain SELECT * FROM t_time WHERE t::date = '2013-03-03'; QUERY PLAN --------------------------------------------------------------- Seq Scan on t_time (cost=0.00..201326.05 rows=51822 width=8) Filter: ((t)::date = '2013-03-03'::date) Planning time: 0.090 ms (3 rows) |
PostgreSQL does not transform the WHERE-clause into a range and therefore the index is not considered to be usable. To many PostgreSQL based applications this is more or less a death sentence when it comes to performance. A handful of inefficient sequential scans can ruin the performance of the entire application easily.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Hello,
Nice Case description
The Problem is casting from timestamp to date type. so index is not used.
Thank you
I agreed with you.
t = '2013-03-03'
: Index Only Scan using idx_t on t_timet::date = '2013-03-03'
: Seq Scan on t_time