Shrink the storage footprint of PostgreSQL data

08.2014 | Category: How To | Tags: performance tuning, storage

08.2014

Category: How To

When writing data structures and table layouts, people usually don't care about the order of columns. Why should they anyway? Columns can be reordered inside a SELECT at will, so why care? Well, you should care because the order of columns might have significant impact on the size of your table. That's right: The size of the table might depend on the order of columns - even if the same data is stored.

Why is that so? Internally, there is something called CPU alignment which governs the internal size of a data structure. By cleverly aligning and ordering columns, there is some potential to optimize things. You don't believe it? Let us give it a try:

test=# CREATE TABLE t_test (
     i1       int,
     i2       int,
     i3       int, 
     v1       varchar(100),
     v2       varchar(100),
     v3       varchar(100)
);
CREATE TABLE

1

2

3

4

5

6

7

8

9

test=# CREATE TABLE t_test (

i1 int,

i2 int,

i3 int,

v1 varchar(100),

v2 varchar(100),

v3 varchar(100)

);

CREATE TABLE

In this example there are 6 columns. 3 integer columns close to each other and 3 varchar columns also close to each other. Let us add 10 million rows to the table:

test=# INSERT INTO t_test SELECT 10, 20, 30, 'abcd', 'abcd', 'abcd'
       FROM generate_series(1, 10000000);
INSERT 0 10000000

1

2

3

test=# INSERT INTO t_test SELECT 10, 20, 30, 'abcd', 'abcd', 'abcd'

FROM generate_series(1, 10000000);

INSERT 0 10000000

The overall size of the table is 574 MB:

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
 pg_size_pretty
----------------
 574 MB
(1 row)

1

2

3

4

5

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));

pg_size_pretty

----------------

574 MB

(1 row)

Let us try to change the layout of those columns. In the next example there is a varchar field followed by an integer column. This layout is repeated three times:

test=# CREATE TABLE t_test (
     v1       varchar(100),
     i1       int,
     v2       varchar(100),
     i2       int,
     v3       varchar(100),
     i3       int
);
CREATE TABLE

1

2

3

4

5

6

7

8

9

test=# CREATE TABLE t_test (

v1 varchar(100),

i1 int,

v2 varchar(100),

i2 int,

v3 varchar(100),

i3 int

);

CREATE TABLE

If 10 million rows are added now …

test=# INSERT INTO t_test SELECT 'abcd', 10, 'abcd', 20, 'abcd', 30
       FROM generate_series(1, 10000000);
INSERT 0 10000000

… the table will be a lot larger:

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
 pg_size_pretty
----------------
 651 MB
(1 row)

1

2

3

4

5

6

7

8

9

10

11

test=# INSERT INTO t_test SELECT 'abcd', 10, 'abcd', 20, 'abcd', 30

FROM generate_series(1, 10000000);

INSERT 0 10000000

… the table will be a lot larger:

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));

pg_size_pretty

----------------

651 MB

(1 row)

The data in the table is the same - it has just been chosen in a way to demonstrate the effect. If I had used “abc” instead of “abcd” there would be no difference in size because the 4 character string made sure that the string just did not fit into a smaller space.

Conclusion

The important conclusion here is that it definitely makes sense, to pack similar data types together. And, I have found it useful to pack integer columns at the beginning of the tables. In many cases this can yield in some extra percentage points of speed - just because things are a bit smaller than if you don't make use of this little rule.

Get some great hints and tips about indexes in my post, "Find and Fix a Missing PostgreSQL Index".

3 responses to “Shrink the storage footprint of PostgreSQL data”

Shaun Thomas says:

September 4, 2014 at 4:12 pm

This is an important point. It's not something that jumps out as an obvious concern.

Reply
Sean Hs says:

September 4, 2014 at 11:11 pm

Thanks for the great post. I'm wondering if changing the varchar's to something like char's would make a difference? My understanding with the varchar datatype is that it's dynamically resized.

Reply
- Jim Nasby says:
  
  September 5, 2014 at 11:30 pm
  
  In Postgres, no. char() is stored as a varlena, just like varchar, numeric, bytea and some other data types. So it won't make any difference.
  
  Do note that "char" is a single-character datatype which is not the same as char().
  
  Reply

Shrink the storage footprint of PostgreSQL data

Conclusion

3 responses to “Shrink the storage footprint of PostgreSQL data”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts