Table of Contents
A frequently asked question in this big data world is whether it is better to store binary data inside or outside of a PostgreSQL database. Also, since PostgreSQL has two ways of storing binary data, which one is better?
I decided to benchmark the available options to have some data points next time somebody asks me, and I thought this might be interesting for others as well.
For that, you store the binary data in files outside the database and store the path of the file in the database.
The obvious downside is that consistency is not guaranteed that way. With all data inside the database, PostgreSQL can guarantee the atomicity of transactions. So it will make the architecture and the implementation somewhat more complicated to store the data outside the database.
One approach to consistency is to always add a file before storing its metadata in the database. When a file's metadata are deleted, you simply leave the file on the file system. This way, there can be no path in the database that does not exist in the file system. You can clean up the filesystem with offline reorganization runs that get rid of orphaned files.
There are two big advantages with this approach:
PostgreSQL Large Objects are the “old way” of storing binary data in PostgreSQL. The system assigns an oid
(a 4-byte unsigned integer) to the Large Object, splits it up in chunks of 2kB and stores it in the pg_largeobject
catalog table.
You refer to the Large Object by its oid
, but there is no dependency between an oid
stored in a table and the associated Large Object. If you delete the table row, you have to delete the Large Object explicitly (or use a trigger).
Large Objects are cumbersome, because the code using them has to use a special Large Object API. The SQL standard does not cover that, and not all client APIs have support for it.
There are two advantages of Large Objects:
bytea
bytea
(short for “byte array”) is the “new way” is storing binary data in PostgreSQL. It uses TOAST (The Oversized-Attribute Storage Technique, proudly called “the best thing since sliced bread” by the PostgreSQL community) to transparently store data out of line.
A bytea
is stored directly in the database table and vanishes when you delete the table row. No special maintenance is necessary.
The main disadvantages of bytea
are:
bytea
, all data have to be stored in memory (no streaming support)
If you choose bytea
, you should be aware of how TOAST works:
Now for already compressed data, the first step is unnecessary and even harmful. After compressing the data, PostgreSQL will realize that the compressed data have actually grown (because PostgreSQL uses a fast compression algorithm) and discard them. That is an unnecessary waste of CPU time.
Moreover, if you retrieve only of a substring of a TOASTed value, PostgreSQL still has to retrieve all chunks that are required to decompress the value.
Fortunately, PostgreSQL allows you to specify how TOAST should handle a column. The default EXTENDED
storage type works as described above. If we choose EXTERNAL
instead, values will be stored out of line, but not compressed. This saves CPU time. It also allows operations that need only a substring of the data to access only those chunks that contain the actual data.
So you should always change the storage type for compressed binary data to EXTERNAL
. This also allows us to implement streaming, at least for read operations, using the substr
function (see below).
The bytea
table that I use in this benchmark is defined like
1 2 3 4 5 6 |
CREATE TABLE bins ( id bigint PRIMARY KEY, data bytea NOT NULL ); ALTER TABLE bins ALTER COLUMN data SET STORAGE EXTERNAL; |
I chose to write my little benchmark in Java, which is frequently used for application code. I wrote an interface for the code that reads the binary data, so that it is easy to test the different implementations with the same code. This also makes it easier to compare the implementations:
1 2 3 4 5 6 7 8 9 10 |
import java.io.EOFException; import java.io.IOException; import java.sql.SQLException; public interface LOBStreamer { public final static int CHUNK_SIZE = 1048576; public int getNextBytes(byte[] buf) throws EOFException, IOException, SQLException; public void close() throws IOException, SQLException; } |
CHUNK_SIZE
is the unit in which the data will be read.
In the constructor, the database is queried to get the path of the file. That file is opened for reading; the chunks are read in getNextBytes
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
import java.io.IOException; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.io.EOFException; import java.io.File; import java.io.FileInputStream; public class FileStreamer implements LOBStreamer { private FileInputStream file; public FileStreamer(java.sql.Connection conn, long objectID) throws IOException, SQLException { PreparedStatement stmt = conn.prepareStatement( 'SELECT path FROM lobs WHERE id = ?'); stmt.setLong(1, objectID); ResultSet rs = stmt.executeQuery(); rs.next(); String path = rs.getString(1); this.file = new FileInputStream(new File(path)); rs.close(); stmt.close(); } @Override public int getNextBytes(byte[] buf) throws EOFException, IOException { int result = file.read(buf); if (result == -1) throw new EOFException(); return result; } @Override public void close() throws IOException { file.close(); } } |
The Large Object is opened in the constructor. Note that all read operations must take place in the same database transaction that opened the large object.
Since Large Objects are not covered by the SQL or JDBC standard, we have to use the PostgreSQL-specific extensions of the JDBC driver. That makes the code not portable to other database systems.
However, since Large Objects specifically support streaming, the code is simpler than for the other options.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
import java.io.EOFException; import java.io.IOException; import java.sql.Connection; import java.sql.SQLException; import org.postgresql.PGConnection; import org.postgresql.largeobject.LargeObject; import org.postgresql.largeobject.LargeObjectManager; public class LargeObjectStreamer implements LOBStreamer { private LargeObject lob; public LargeObjectStreamer(Connection conn, long objectID) throws SQLException { PGConnection pgconn = conn.unwrap(PGConnection.class); this.lob = pgconn.getLargeObjectAPI().open( objectID, LargeObjectManager.READ); } @Override public int getNextBytes(byte[] buf) throws EOFException, SQLException { int result = lob.read(buf, 0, buf.length); if (result == 0) throw new EOFException(); return result; } @Override public void close() throws IOException, SQLException { lob.close(); } } |
bytea
The constructor retrieves the length of the value and prepares a statement that fetches chunks of the binary data.
Note that the code is more complicated that in the other examples, because I had to implement streaming myself.
With this approach, I don't need to read all chunks in a single transaction, but I do so to keep the examples as similar as possible.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
import java.io.EOFException; import java.io.IOException; import java.io.InputStream; import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; public class ByteaStreamer implements LOBStreamer { private PreparedStatement stmt; private Connection conn; private int position = 1, size; public ByteaStreamer(Connection conn, long objectID) throws SQLException { PreparedStatement len_stmt = conn.prepareStatement( 'SELECT length(data) FROM bins WHERE id = ?'); len_stmt.setLong(1, objectID); ResultSet rs = len_stmt.executeQuery(); if (!rs.next()) throw new SQLException('no data found', 'P0002'); size = rs.getInt(1); rs.close(); len_stmt.close(); this.conn = conn; this.stmt = conn.prepareStatement( 'SELECT substr(data, ?, ?) FROM bins WHERE id = ?'); this.stmt.setLong(3, objectID); } @Override public int getNextBytes(byte[] buf) throws EOFException, IOException, SQLException { int result = (position > size + 1 - buf.length) ? (size - position + 1) : buf.length; if (result == 0) throw new EOFException(); this.stmt.setInt(1, position); this.stmt.setInt(2, result); ResultSet rs = this.stmt.executeQuery(); rs.next(); InputStream is = rs.getBinaryStream(1); is.read(buf); is.close(); rs.close(); position += result; return result; } @Override public void close() throws SQLException { this.stmt.close(); } } |
I performed the benchmark the code on a laptop with and Intel® Core™ i7-8565U CPU and SSD storage. The PostgreSQL version used was 12.2. Data were cached in RAM, so the results don't reflect disk I/O overhead. The database connection used the loopback interface to reduce the network impact to a minimum.
This code was used to run the test:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
Class.forName('org.postgresql.Driver'); Connection conn = DriverManager.getConnection( 'jdbc:postgresql:test?user=laurenz&password=...'); // important for large objects conn.setAutoCommit(false); byte[] buf = new byte[LOBStreamer.CHUNK_SIZE]; long start = System.currentTimeMillis(); for (int i = 0; i < LOBStreamTester.ITERATIONS; ++i) { // set LOBStreamer implementation and object ID as appropriate LOBStreamer s = new LargeObjectStreamer(conn, 62409); try { while (true) s.getNextBytes(buf); } catch (EOFException e) { s.close(); } conn.commit(); } System.out.println( 'Average duration: ' + (double)(System.currentTimeMillis() - start) / LOBStreamTester.ITERATIONS); conn.close(); |
I ran each test multiple times in a tight loop, both with a large file (350 MB) and a small file (4.5 MB). The files were compressed binary data.
350 MB data | 4.5 MB data | |
---|---|---|
file system | 46 ms | 1 ms |
Large Object | 950 ms | 8 ms |
bytea |
590 ms | 6 ms |
In my benchmark, retrieving binary objects from the database is roughly ten times slower that reading them from files in a file system. Surprisingly, streaming from a bytea
with EXTERNAL
storage is measurably faster than streaming from a Large Object. Since Large Objects specifically support streaming, I would have expected the opposite.
To sum it up, here are the advantages and disadvantages of each method:
Advantages:
Disadvantages:
Advantages:
Disadvantages:
DELETE
triggers in the databasebytea
:Advantages:
Disadvantages:
Storing large binary data in the database is only a good idea on a small scale, when a simple architecture and ease of coding are the prime goals and high performance is not important. Then the increased database size also won't be a big problem.
Large Objects should only be used if the data exceed 1GB or streaming writes to the database is important.
If you need help with the performance and architecture of your PostgreSQL database, don't hesitate to ask us.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Thank you for an article. But one question remains.
What if we change storage strategy for pg_largeobject.data column to external?
Streaming writes to bytea (from a client perspective) is theoretically
possible using UPDATE and concatenation. I guess with a lot of write
amplification though.
From a client perspective it looks that way, but really it isn't. If you
UPDATE tab SET bincol = bincol || 'xdeadbeef';
the complete value will by read from and written to storage, and you may run out of memory on the server.
maybe LO is slow, because native protocol support is not used. Probably using lo_export should be much faster.
I am not sure what you mean by the native protocol.
The JDBC driver uses the fast-path API to call the large object server functions.
libpq allows to use lo_import function. https://www.postgresql.org/docs/9.0/lo-interfaces.html. But probably JDBC driver doesn't use libpq. Another question is impact of Java implementation. Can you check a performance of lo_import from psql? When I look on your examples, there is not too much reasons why LO is too much slower than byte.
There are another argument for LO. Import, export LO typically needs significantly less more RAM (client side, server side) than bytea. I don't know how are PHP limits today, but 15 years ago it was important factor
dag, i luv this article on all things BLOB in PG
Hi,
You say above:
However, there is this from SQLite:
Now, I know that they are both different systems and that tipping points (if any) may be (very) different, but your statement may not hold globally?
Has this been benchmarked at all? Would be an interesting one?
Pól...
Really good article!
One question: if I change the TOAST strategy from extended to external on existing column in existing table by using "ALTER TABLE bins ALTER COLUMN data SET STORAGE EXTERNAL", what will happen to the existing data in that column? The existing data will be decompressed and then stored out-of-line?
No, existing data won't be affected.
But if they are compressed binary data, PostgreSQL won't have compressed them anyway (it will have attempted compression and then realized that the data grew, as described in my article).
Thank you for your prompt response.
The tricky part in my case is that my data is binary data (from binary file) but not compressed. I'm not sure if PostgreSQL has compressed them (or all of them in different rows) because I guess there is the probability that PostgreSQL's compression algorithm might make binary data in some rows smaller but not for binary data in other rows. This might result in that in some rows binary data is compressed while in other rows binary data is not compressed. I'm concerned that if I change to "EXTERNAL", there will be some compatible issues, e.g. PostgreSQL cannot read the existing compressed data
I see.
You don't have to worry about this, because it is stored in the column header if the data are compressed or not. So regardless of the column setting, PostgreSQL will decompress data that were stored compressed.
Thank you for your answer. Really appreciate it
Hi! Maybe a thing od two to mention about bytea types. I have a 10GB big database, a table 7.5GB big in it with bytea column. It takes 12 hours to do backup and when I was troubleshooting slowness I found that pg_dump did 240GB of network traffic while backup speed was ~45Mbit on average. Furthur troubleshooting discovered that bytea type in that big table was the reason for backup to take so long. Unfortunately, I still didn't find a solution to fix but based on my research I should migrate data to hex field type. Any inputs on that?
You should definitely use
bytea
and store the data as they are.I do not have a ready explanation for the slowness or the amount of traffic; this would need further investigation.
Perhaps pg_dumpbinary can improve the situation for you.