The goal of the PostgreSQL optimizer is to provide you with a plan, which executes as fast as possible and returns all the data as rapidly as possible. In short: The game is about overall execution time of queries.
However, there are cases in which you are not interested in receiving the entire result as fast as possible. Just consider the following example: Imagine going through a catalog. Do you really want to read all 649 pages? Do you really want to go through 10.000 products? Not really. You open a catalog, you read the first handful of products and this was it.
So, in many cases it is not about the speed of the entire dataset but about the performance of the first couple of records.
Here is an example:
1 2 3 4 5 6 7 |
test=# CREATE TABLE t_test (id int4); CREATE TABLE test=# INSERT INTO t_test SELECT * FROM generate_series(1, 10000000) ORDER BY random(); INSERT 0 10000000 |
We generate a data set consisting of 10 million rows. Make sure that the data is not stored in the table sorted. Otherwise this example will not work (due to high correlation estimates).
Then we create a simple index:
1 2 |
test=# CREATE INDEX idx_id ON t_test (id); CREATE INDEX |
Let us run a simple query now:
1 2 3 4 5 6 7 |
test=# explain SELECT * FROM t_test ORDER BY id; QUERY PLAN ----------------------------------------------------------------------- Sort (cost=1717084.33..1742084.33 rows=10000000 width=4) Sort Key: id -> Seq Scan on t_test (cost=0.00..144248.00 rows=10000000 width=4) (3 rows) |
The point here is that PostgreSQL will sort the data. Most likely this is going to be the fastest plan here. But, if we sort the data, it actually means that the first rows of data will only be delivered to the client as soon as the entire thing has been sorted.
Going back to our example involving the catalog it would mean: We got to read and prepare the entire catalog to read just a handful of rows at the beginning. Response times will be slow and, more importantly, we would not read the entire stuff anyway.
In PostgreSQL a cursor can be used to fetch large amounts of data. Cursors are frequently used to do batch processing. However, there is one more important thing to know about a cursor. By default a cursor will optimize for the first 10% of data. This behavior can be changed even by setting cursor_tuple_fraction.
Here is an example showing how things can be changed:
1 2 3 4 |
test=# BEGIN; BEGIN test=# SET cursor_tuple_fraction TO 0.0001; SET |
In this case we have told the cursor to expect only a very small fraction of the data to be fetched.
We can declare the cursor now:
1 2 3 4 5 6 7 |
test=# timing Timing is on. test=# DECLARE mycur CURSOR FOR SELECT * FROM t_test ORDER BY id; DECLARE CURSOR Time: 0.254 ms |
Fetching the row will be supersonic fast:
1 2 3 4 5 6 7 |
test=# FETCH NEXT FROM mycur; id ---- 1 (1 row) Time: 0.249 ms |
Instead of sorting all the data at once PostgreSQL will return the data extremely fast. Sure, looping through the entire cursor will be slower, but - there is no need to sort all the data (saving on disk I/O) and the result will be delivered without an initial delay, which can be crucial in many cases.
If you want to get to know more about this topic - contact us!
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply