CYBERTEC PostgreSQL Logo

Changing histogram sizes

10.2013 / Category: / Tags: |

Histograms are a common way for a relational database to store statistical information about data. What is this kind of statistical information good for? Well, statistics are the rocket fuel behind performance. If statistics are bad, the optimizer will come up with bad decisions and poor execution plans will be the result. The PostgreSQL optimizer is a damn sophisticated piece of software and it provides end users with good plans in general – still, without statistics even the best SQL optimizer will yield bad performance.

The good news is that when things are a little tight, the end user can have some influence on the way PostgreSQL stores statistics. The ALTER TABLE ... SET STATISTICS command will allow users to define the size of the internal histograms storing statistical information.

Before we dig into those details we create a simple table and add some data to it:

This will create a table containing one million numbers. To make sure that the optimizer will know about the content of the table, we can run ANALYZE (a command in charge of creating statistics):

How statistics is used

Whenever PostgreSQL has to plan a query, it will rely on the statistics we have made just before. Let us take a look at a very simplistic plan:

We try to find all values lower than 10.000. The planner estimates that we will find roughly 9.100 rows. Overall this is pretty good and totally sufficient to plan the query efficiently. Remember, estimating statistics is not about having a final answer right away – it is about having a reasonably well estimate.

Let us see, which statistics PostgreSQL has made. To do so we can take a look at a system view called pg_stats:

If you are interested in the content of pg_stats we recommend to take a look at the PostgreSQL documentation. It outlines nicely how things work in this area: http://www.postgresql.org/docs/9.3/static/view-pg-stats.html

Changing the size of the histogram

Sometimes it can be quite beneficial to change the size of a column's histogram. Reducing the size if the histogram will make estimates a lot less precise – but, it will reduce the time needed to plan a query. If you are dealing with a table with 1 billion rows and all you do is hitting the primary key to fetch exactly one row at a time, a narrow histogram is definitely not a big issue.

However, if you are dealing with more complex operations, keeping the default to 100 or to raise this value can be quite beneficial.

This is how you can change the size of the histogram of a specific column:

Once this has been done, we can recreate the stats:

Querying the stats table reveals what PostgreSQL knows about the column:

Note that the histogram consists of just 11 values now (which gives us 10 bars).

The impact of a change

We can toy around with this setting now and see what happens to the plan made by PostgreSQL once the statistics target has been risen to 1.000:

In our example the estimate is a little bit more precise than initially. However, this comes at a cost. Planning time has risen – the plan is still the same (because there is no other choice for the planner anyway).

Btw, keep in mind that those values should not be changed on a daily basis and in many cases it is not even necessary to change this stuff at all – it is nice to have the opportunity to do so, however, when beneficial.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram