CYBERTEC PostgreSQL Logo

Shrink the storage footprint of PostgreSQL data

08.2014 / Category: / Tags: |

When writing data structures and table layouts, people usually don't care about the order of columns. Why should they anyway? Columns can be reordered inside a SELECT at will, so why care? Well, you should care because the order of columns might have significant impact on the size of your table. That's right: The size of the table might depend on the order of columns - even if the same data is stored.

Why is that so? Internally, there is something called CPU alignment which governs the internal size of a data structure. By cleverly aligning and ordering columns, there is some potential to optimize things. You don't believe it? Let us give it a try:

In this example there are 6 columns. 3 integer columns close to each other and 3 varchar columns also close to each other. Let us add 10 million rows to the table:

The overall size of the table is 574 MB:

Let us try to change the layout of those columns. In the next example there is a varchar field followed by an integer column. This layout is repeated three times:

If 10 million rows are added now …

The data in the table is the same - it has just been chosen in a way to demonstrate the effect. If I had used “abc” instead of “abcd” there would be no difference in size because the 4 character string made sure that the string just did not fit into a smaller space.

Conclusion

The important conclusion here is that it definitely makes sense, to pack similar data types together. And, I have found it useful to pack integer columns at the beginning of the tables. In many cases this can yield in some extra percentage points of speed - just because things are a bit smaller than if you don't make use of this little rule.

Get some great hints and tips about indexes in my post, "Find and Fix a Missing PostgreSQL Index".

3 responses to “Shrink the storage footprint of PostgreSQL data”

  1. Thanks for the great post. I'm wondering if changing the varchar's to something like char's would make a difference? My understanding with the varchar datatype is that it's dynamically resized.

    • In Postgres, no. char() is stored as a varlena, just like varchar, numeric, bytea and some other data types. So it won't make any difference.

      Do note that "char" is a single-character datatype which is not the same as char().

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram