When writing data structures and table layouts, people usually don't care about the order of columns. Why should they anyway? Columns can be reordered inside a SELECT at will, so why care? Well, you should care because the order of columns might have significant impact on the size of your table. That's right: The size of the table might depend on the order of columns - even if the same data is stored.
Why is that so? Internally, there is something called CPU alignment which governs the internal size of a data structure. By cleverly aligning and ordering columns, there is some potential to optimize things. You don't believe it? Let us give it a try:
1 2 3 4 5 6 7 8 9 |
test=# CREATE TABLE t_test ( i1 int, i2 int, i3 int, v1 varchar(100), v2 varchar(100), v3 varchar(100) ); CREATE TABLE |
In this example there are 6 columns. 3 integer columns close to each other and 3 varchar columns also close to each other. Let us add 10 million rows to the table:
1 2 3 |
test=# INSERT INTO t_test SELECT 10, 20, 30, 'abcd', 'abcd', 'abcd' FROM generate_series(1, 10000000); INSERT 0 10000000 |
The overall size of the table is 574 MB:
1 2 3 4 5 |
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 574 MB (1 row) |
Let us try to change the layout of those columns. In the next example there is a varchar field followed by an integer column. This layout is repeated three times:
1 2 3 4 5 6 7 8 9 |
test=# CREATE TABLE t_test ( v1 varchar(100), i1 int, v2 varchar(100), i2 int, v3 varchar(100), i3 int ); CREATE TABLE |
If 10 million rows are added now …
1 2 3 4 5 6 7 8 9 10 11 |
test=# INSERT INTO t_test SELECT 'abcd', 10, 'abcd', 20, 'abcd', 30 FROM generate_series(1, 10000000); INSERT 0 10000000 … the table will be a lot larger: test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 651 MB (1 row) |
The data in the table is the same - it has just been chosen in a way to demonstrate the effect. If I had used “abc” instead of “abcd” there would be no difference in size because the 4 character string made sure that the string just did not fit into a smaller space.
The important conclusion here is that it definitely makes sense, to pack similar data types together. And, I have found it useful to pack integer columns at the beginning of the tables. In many cases this can yield in some extra percentage points of speed - just because things are a bit smaller than if you don't make use of this little rule.
Get some great hints and tips about indexes in my post, "Find and Fix a Missing PostgreSQL Index".
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
This is an important point. It's not something that jumps out as an obvious concern.
Thanks for the great post. I'm wondering if changing the varchar's to something like char's would make a difference? My understanding with the varchar datatype is that it's dynamically resized.
In Postgres, no. char() is stored as a varlena, just like varchar, numeric, bytea and some other data types. So it won't make any difference.
Do note that "char" is a single-character datatype which is not the same as char().