CYBERTEC PostgreSQL Logo

Binary data performance in PostgreSQL

05.2020 / Category: / Tags: |
binary data performance nightmare
© Laurenz Albe 2020

 

A frequently asked question in this big data world is whether it is better to store binary data inside or outside of a PostgreSQL database. Also, since PostgreSQL has two ways of storing binary data, which one is better?

I decided to benchmark the available options to have some data points next time somebody asks me, and I thought this might be interesting for others as well.

Alternatives for storing binary data

Storing the data outside the database

For that, you store the binary data in files outside the database and store the path of the file in the database.

The obvious downside is that consistency is not guaranteed that way. With all data inside the database, PostgreSQL can guarantee the atomicity of transactions. So it will make the architecture and the implementation somewhat more complicated to store the data outside the database.

One approach to consistency is to always add a file before storing its metadata in the database. When a file's metadata are deleted, you simply leave the file on the file system. This way, there can be no path in the database that does not exist in the file system. You can clean up the filesystem with offline reorganization runs that get rid of orphaned files.

There are two big advantages with this approach:

  • It keeps the database small, which makes maintenance easier. For example, it is easier to perform an incremental backup of a file system than of a database.
  • The performance of reading a file directly from the file system must be better. After all, the database is also stored on files, and there must be a certain overhead.

Storing the data in Large Objects

PostgreSQL Large Objects are the “old way” of storing binary data in PostgreSQL. The system assigns an oid (a 4-byte unsigned integer) to the Large Object, splits it up in chunks of 2kB and stores it in the pg_largeobject catalog table.

You refer to the Large Object by its oid, but there is no dependency between an oid stored in a table and the associated Large Object. If you delete the table row, you have to delete the Large Object explicitly (or use a trigger).

Large Objects are cumbersome, because the code using them has to use a special Large Object API. The SQL standard does not cover that, and not all client APIs have support for it.

There are two advantages of Large Objects:

  • you can store arbitrary large data with Large Objects
  • the Large Object API has support for streaming, that is, reading and writing Large Objects in chunks

Storing the data as bytea

bytea (short for “byte array”) is the “new way” is storing binary data in PostgreSQL. It uses TOAST (The Oversized-Attribute Storage Technique, proudly called “the best thing since sliced bread” by the PostgreSQL community) to transparently store data out of line.

A bytea is stored directly in the database table and vanishes when you delete the table row. No special maintenance is necessary.

The main disadvantages of bytea are:

  • like all TOASTed data types, there is an absolute length limit of 1GB
  • when you read or write a bytea, all data have to be stored in memory (no streaming support)

 

Important TOAST considerations

If you choose bytea, you should be aware of how TOAST works:

  • for a new table row that would exceed 2000 bytes, variable length data types are compressed, if possible
  • if the compressed data would still exceed 2000 bytes, PostgreSQL splits variable length data types in chunks and stores them out of line in a special “TOAST table”

Now for already compressed data, the first step is unnecessary and even harmful. After compressing the data, PostgreSQL will realize that the compressed data have actually grown (because PostgreSQL uses a fast compression algorithm) and discard them. That is an unnecessary waste of CPU time.

Moreover, if you retrieve only of a substring of a TOASTed value, PostgreSQL still has to retrieve all chunks that are required to decompress the value.

Fortunately, PostgreSQL allows you to specify how TOAST should handle a column. The default EXTENDED storage type works as described above. If we choose EXTERNAL instead, values will be stored out of line, but not compressed. This saves CPU time. It also allows operations that need only a substring of the data to access only those chunks that contain the actual data.

So you should always change the storage type for compressed binary data to EXTERNAL. This also allows us to implement streaming, at least for read operations, using the substr function (see below).

The bytea table that I use in this benchmark is defined like

Benchmarking the different approaches

I chose to write my little benchmark in Java, which is frequently used for application code. I wrote an interface for the code that reads the binary data, so that it is easy to test the different implementations with the same code. This also makes it easier to compare the implementations:

CHUNK_SIZE is the unit in which the data will be read.

Code to read binary data from the file system

In the constructor, the database is queried to get the path of the file. That file is opened for reading; the chunks are read in getNextBytes.

Code to read binary data from a Large Object

The Large Object is opened in the constructor. Note that all read operations must take place in the same database transaction that opened the large object.

Since Large Objects are not covered by the SQL or JDBC standard, we have to use the PostgreSQL-specific extensions of the JDBC driver. That makes the code not portable to other database systems.

However, since Large Objects specifically support streaming, the code is simpler than for the other options.

Code to read binary data from a bytea

The constructor retrieves the length of the value and prepares a statement that fetches chunks of the binary data.

Note that the code is more complicated that in the other examples, because I had to implement streaming myself.

With this approach, I don't need to read all chunks in a single transaction, but I do so to keep the examples as similar as possible.

Benchmark results for reading binary data

I performed the benchmark the code on a laptop with and Intel® Core™ i7-8565U CPU and SSD storage. The PostgreSQL version used was 12.2. Data were cached in RAM, so the results don't reflect disk I/O overhead. The database connection used the loopback interface to reduce the network impact to a minimum.

This code was used to run the test:

I ran each test multiple times in a tight loop, both with a large file (350 MB) and a small file (4.5 MB). The files were compressed binary data.

350 MB data 4.5 MB data
file system 46 ms 1 ms
Large Object 950 ms 8 ms
bytea 590 ms 6 ms

Summary

In my benchmark, retrieving binary objects from the database is roughly ten times slower that reading them from files in a file system. Surprisingly, streaming from a bytea with EXTERNAL storage is measurably faster than streaming from a Large Object. Since Large Objects specifically support streaming, I would have expected the opposite.

To sum it up, here are the advantages and disadvantages of each method:

Binary data in the file system:

Advantages:

  • by far the fastest method
  • small database for ease of maintenance, e.g. backups

Disadvantages:

  • more difficult to guarantee consistency
  • more complicated architecture

Binary data as Large Objects:

Advantages:

  • constistence automatically guaranteed
  • API supports streaming

Disadvantages:

  • very bad performance
  • non-standard API
  • requires special maintenance like DELETE triggers in the database
  • database becomes large and unwieldy

Binary data as bytea:

Advantages:

  • consistence automatically guaranteed
  • works with standard SQL

Disadvantages:

  • bad performance
  • writes cannot be streamed (need lots of RAM)
  • database becomes large and unwieldy

Storing large binary data in the database is only a good idea on a small scale, when a simple architecture and ease of coding are the prime goals and high performance is not important. Then the increased database size also won't be a big problem.

Large Objects should only be used if the data exceed 1GB or streaming writes to the database is important.

If you need help with the performance and architecture of your PostgreSQL database, don't hesitate to ask us.

15 responses to “Binary data performance in PostgreSQL”

  1. Thank you for an article. But one question remains.
    What if we change storage strategy for pg_largeobject.data column to external?

  2. Streaming writes to bytea (from a client perspective) is theoretically
    possible using UPDATE and concatenation. I guess with a lot of write
    amplification though.

    • From a client perspective it looks that way, but really it isn't. If you
      UPDATE tab SET bincol = bincol || 'xdeadbeef';
      the complete value will by read from and written to storage, and you may run out of memory on the server.

  3. maybe LO is slow, because native protocol support is not used. Probably using lo_export should be much faster.

    • I am not sure what you mean by the native protocol.
      The JDBC driver uses the fast-path API to call the large object server functions.

      • libpq allows to use lo_import function. https://www.postgresql.org/docs/9.0/lo-interfaces.html. But probably JDBC driver doesn't use libpq. Another question is impact of Java implementation. Can you check a performance of lo_import from psql? When I look on your examples, there is not too much reasons why LO is too much slower than byte.

        There are another argument for LO. Import, export LO typically needs significantly less more RAM (client side, server side) than bytea. I don't know how are PHP limits today, but 15 years ago it was important factor

  4. Hi,

    You say above:

    he performance of reading a file directly from the file system must be better. After all, the database is also stored on files, and there must be a certain overhead.

    However, there is this from SQLite:

    SQLite reads and writes small blobs (for example, thumbnail images) 35% faster¹ than the same blobs can be read from or written to individual files on disk using fread() or fwrite().

    Now, I know that they are both different systems and that tipping points (if any) may be (very) different, but your statement may not hold globally?

    Has this been benchmarked at all? Would be an interesting one?

    Pól...

  5. Really good article!
    One question: if I change the TOAST strategy from extended to external on existing column in existing table by using "ALTER TABLE bins ALTER COLUMN data SET STORAGE EXTERNAL", what will happen to the existing data in that column? The existing data will be decompressed and then stored out-of-line?

    • No, existing data won't be affected.
      But if they are compressed binary data, PostgreSQL won't have compressed them anyway (it will have attempted compression and then realized that the data grew, as described in my article).

      • Thank you for your prompt response.
        The tricky part in my case is that my data is binary data (from binary file) but not compressed. I'm not sure if PostgreSQL has compressed them (or all of them in different rows) because I guess there is the probability that PostgreSQL's compression algorithm might make binary data in some rows smaller but not for binary data in other rows. This might result in that in some rows binary data is compressed while in other rows binary data is not compressed. I'm concerned that if I change to "EXTERNAL", there will be some compatible issues, e.g. PostgreSQL cannot read the existing compressed data

        • I see.
          You don't have to worry about this, because it is stored in the column header if the data are compressed or not. So regardless of the column setting, PostgreSQL will decompress data that were stored compressed.

  6. Hi! Maybe a thing od two to mention about bytea types. I have a 10GB big database, a table 7.5GB big in it with bytea column. It takes 12 hours to do backup and when I was troubleshooting slowness I found that pg_dump did 240GB of network traffic while backup speed was ~45Mbit on average. Furthur troubleshooting discovered that bytea type in that big table was the reason for backup to take so long. Unfortunately, I still didn't find a solution to fix but based on my research I should migrate data to hex field type. Any inputs on that?

    • You should definitely use bytea and store the data as they are.
      I do not have a ready explanation for the slowness or the amount of traffic; this would need further investigation.
      Perhaps pg_dumpbinary can improve the situation for you.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram