When talking to customers, sometimes I get the question: How should PostgreSQL installations deal with huge pages and large memory allocations? In particular, experienced Oracle DBA's are interested in the details behind PostgreSQL and Huge Pages on Linux, so I'll try to explain it in a bit more detail in the following blog post.
Table of Contents
For people not familiar with the meaning of huge pages, here is a short overview of what they are and which problems they try to solve.
On any modern system, applications don't use physical memory directly. Instead, they use a virtual memory address model to make it easier to handle memory allocation and avoid the complexity of computing and mapping physical addresses into the application memory space. The virtual memory address model, however, requires the computer to translate virtual addresses into their corresponding physical memory addresses as soon as any application reads or writes to or from memory. This mapping is organized in the so-called page tables, a hierarchically-organized lookup table.
Memory allocations on any common operating system are done by allocating single or multiple pages of available memory. These pages have a specific size, mostly depending on the operating system. On Linux, the default page size used by memory allocation are 4kB. So someone can imagine that allocating very large amounts of memory requires a high number of pages to be held in the page table, in order to map them to their corresponding physical addresses. Looking up virtual addresses requires multiple memory accesses, which are very costly when compared to CPU speed. Thus, modern CPUs maintain a so-called Translation Lookaside Buffer (TLB), which is basically a cache that speeds up translation between virtual and physical memory.
The size of a TLB is constrained to a few thousand entries, and especially on systems with high memory-specific memory workload, very soon becomes a bottleneck. If, for example, a database like PostgreSQL uses a shared buffer pool to implement Inter Process Communication (IPC), block layer access, and other functionality over it, this very quickly puts a high amount of pressure on the TLB, since the number of virtual address mappings is limited. Especially when using dozens of gigabytes for the shared buffer pool, such instances will suffer from performance hits because of this, depending on their workload.
On modern operating systems like Linux this problem can be mitigated by using Huge Pages. Since page tables are organized hierarchically, they enable us to summarize allocations in much larger pages than the default. The size of huge pages are architecture-dependent, on x86 systems we can usually expect 2MB or 1GB sizes, IBM POWER allows 64kB, 16MB and 16 GB.
Lookups for specific virtual addresses of memory allocations are then much faster and hopefully more independent of the entries found in the TLB.
On x86, when configuring huge pages, the 2MB page size is the default. You can easily get your system current settings via /proc/meminfo
:
1 2 3 4 5 6 7 8 9 |
% cat /proc/meminfo | grep -i HugePage AnonHugePages: 161792 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB |
HugePages_Total
says no huge pages are configured at the moment, accordingly HugePages_Rsvd
,HugePages_Free
and HugePages_Surp
are equal to zero. Hugepagesize
shows the currently configured default size for huge pages in the kernel. When configuring huge pages later, we will revisit the value of these settings.
PostgreSQL uses huge page allocations only for its shared buffer pool. Dynamic memory allocation for e.g. work_mem
, maintenance_work_mem
, autovacuum_work_mem
doesn't use them. Also, dynamic shared memory segments used to transfer data and communication between parallel query workers don't request them automatically, with one exception, which I'll describe in the next paragraph about configuring PostgreSQL for huge pages on Linux. It's also necessary to have shared buffers allocated in so-called mmap'ed
segments, which are controlled by the shared_memory_type
parameter. This is the default setting on Linux. System V shared memory is only used by PostgreSQL to initialize a very small shared memory area; the buffer pool itself is allocated via mmap
.
Huge Pages aren't configured on recent Linux Distributions out of the box. There are usually two ways to teach your Linux system to provide them to your applications:
- Transparent Huge Pages (THP)
- Explicitely configured pools of huge pages for applications
The first possibility via THP isn't recommended for PostgreSQL on Linux, since it doesn't allocate huge pages directly but provides them to applications as they need them. THP are swappable and considered a slower option compared to explicit configured huge pages. Other operating systems like FreeBSD provide better support in this case.
So we are going to focus on explicit huge pages support for PostgreSQL. In the following sections, examples are all done on a Rocky Linux 9 system, but should apply to all other Linux systems accordingly. To configure explicit huge pages, we need to know the computed size of shared memory and additional shared allocations in advance. As of PostgreSQL 15, this is very easy (but unfortunately requires a PostgreSQL instance to be down). With the PostgreSQL 15 Community RPM installation from yum.postgresql.org it looks as follows:
1 2 3 |
% su - postgres % /usr/pgsql-15/bin/postgres --shared-buffers=20GB -D $PGDATA -C shared_memory_size_in_huge_pages 10475 |
Again, the above code doesn't start PostgreSQL, but calculates the value of shared_memory_size_in_huge_pages
and prints the result to the terminal. This also gives us some flexibility to calculate this value according to specific configuration settings we can provide to the postgres
binary as command line options. The example above specifies --shared-buffers
as a command line option, overriding any other setting in the configuration files (like postgresql.conf
).
If PostgreSQL cannot allocate the requested amount of shared buffers with huge pages, it falls back to normal page size allocations. This is controlled by the configuration parameter huge_pages
, which can have the following values:
try
: Try to allocate huge pages, fall back to normal page allocation if not possible (the default setting currently)off
: Never try to request huge page allocationon
: Force shared buffer pool allocations with huge pagesIf we have finally found our settings to start with, we need to reserve the required pool of huge pages by the kernel. As user root
and using the default huge page size, this can be done during runtime by the following command:
1 |
% echo 10475 > /proc/sys/vm/nr_hugepages |
The memory settings then can be used to confirm what we have in mind:
1 2 3 4 5 6 7 8 9 |
% grep -i hugepage /proc/meminfo AnonHugePages: 124928 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 10475 HugePages_Free: 10475 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB |
Pools for specific huge pages must be reserved as a continuous memory area, so the kernel needs a memory area which is available for the reservation. If that's not possible, far less memory might be available to the pool. In this case, the HugePages_Total
number is lower than we requested. Experimenting with values that are too large can also drive a system into an out-of-memory situation, rendering it unavailable in the worst case. So I recommend carefully checking memory usage if the system hosts other applications or users.
Setting the number of pages like this, though, is not persistent across system reboots. To survive them, the easiest way is to put a configuration into /etc/sysctl.d/
with the right settings and reboot:
1 2 |
% cat /etc/sysctl.d/01-nr-hugepages.conf vm.nr_hugepages=10475 |
PostgreSQL is then configured to request huge pages. In this example, we adjust the postgresql.conf
configuration file to force PostgreSQL to perform allocations accordingly:
1 2 3 |
shared_buffers = 20GB huge_pages = on huge_page_size = 0 #use default kernel setting |
PostgreSQL must be started (or restarted) and should allocate its shared buffer pool via the default huge page size. /proc/meminfo
changes its figures. Now it looks like it does here in the following output:
1 2 3 4 5 6 7 8 9 |
% grep -i hugepage /proc/meminfo AnonHugePages: 176128 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 10475 HugePages_Free: 10264 HugePages_Rsvd: 10264 HugePages_Surp: 0 Hugepagesize: 2048 kB |
Note the HugePages_Rsvd
figure: It reflects the number of pages reserved due to the allocations requested. You can use the pmap
tool with the PID of the postmaster process to get the allocated memory mapped buffer pool segment:
1 2 |
% pmap $(head -n1 /var/lib/pgsql/15/data/postmaster.pid) | grep huge 00007f11a5400000 21452800K rw-s- anon_hugepage (deleted) |
This output shows the address where the memory mapped buffer pool area is mapped, its size and permissions. The (deleted)
indicates this memory mapped allocation is already unlinked, which is a common trick to make it disappear as soon as the process owning it exits.
In the section on how PostgreSQL uses huge pages, we owe an explanation to the exception mentioned there. In general, dynamic shared memory allocations are done on purpose, for example when parallel query workers process a parallelized query. These dynamically allocated memory areas aren't done by mmap
by default, instead PostgreSQL uses the posix
API via shm_open()
on Linux by default. In this case, allocations don't use huge pages. However, when having parallel queries, memory pressure can arise there as well, especially when large amount of data needs to be processed.
PostgreSQL starting with version 14 provides the configuration parameter min_dynamic_shared_memory
. When configured, the specified value defines the additional space in the shared memory created at server startup that should be used for parallel query workers. Since it is allocated along with the buffer pool, it also uses huge pages for allocation. We can extend the example above and re-calculate the setting with 2GB of additional shared memory for use for parallel queries:
1 2 |
% /usr/pgsql-15/bin/postgres -D $PGDATA --min-dynamic-shared-memory=2GB --shared-buffers=20GB -C shared_memory_size_in_huge_pages 11499 |
Again, the PostgreSQL instance needs to be stopped to execute the command successfully. Compared to the previous example, the number for huge pages to be reserved has now increased by an additional 2GB.
These versions are a little bit more difficult to handle. Particularly calculations on how many huge pages need to be reserved are much trickier. The best solution so far is just trying to start PostgreSQL manually with huge_pages=on
and some debugging parameters. In this case PostgreSQL prints the following error message if it is not able to allocate the requested shared memory size:
1 2 3 4 5 |
FATAL: could not map anonymous shared memory: Cannot allocate memory HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 21965570048 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections. LOG: database system is shut down |
The interesting thing is the HINT part of the message, telling us that PostgreSQL wants to allocate 21965570048 bytes of shared memory. This value can be used to calculate the number of pages ourselves. The problem with this number is that this is more than we requested when specifying shared buffers to 20GB. This is because PostgreSQL also needs additional shared memory for other internal processes, like wal_buffers or max_locks_per_transactions. Extension might require additional shared
memory, too. PostgreSQL internally uses a formula that corresponds approximately to the following
1 |
((shm size + (BLKSZ - (shm size % BLKSZ))) / (huge page size * 1024)) + 1 = num huge pages |
Standard PostgreSQL installations are always compiled with BLKSZ equal to 8192, specifying the requested huge page size gives then the following result:
1 |
((21965570048 + (8192 - (21965570048 % 8192))) / (2048 * 1024)) + 1 = 10475 |
Things get even more complicated if you want PostgreSQL to use a page size of 1GB. Since this is not the default on Linux, configuration requires more steps and more careful thought in such a case. This is left to the next blog post, where I'll show you how to deal with such a requirement.
Find out more about allocating memory in Hans-Jürgen Schönig's blog, fixing track activity query size in PostgreSQL conf.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Good explanation, some bit for Ubuntu
postgres@qdec:/usr/lib/postgresql/15/bin$ ./postgres --shared-buffers=24GB -D /var/lib/postgresql/15/main -C shared_memory_size_in_huge_pages --config-file=/etc/postgresql/15/main/postgresql.conf
12564
huges_pages = on #here you have a typo, "huge_pages" is the right one.