Table of Contents
In a recent wrestling match with the Linux “out-of-memory killer” for a CYBERTEC customer I got acquainted with Linux control groups (“cgroups”), and I want to give you a short introduction how they can be used with PostgreSQL and discuss their usefulness.
Warning: This was done on my RedHat Fedora 27 system running Linux 4.16.5 with cgroups v1 managed by systemd
version 234. Both cgroups and systemd
's handling of them seem to be undergoing changes, so your mileage may vary considerably. Still, it should be a useful starting point if you want to explore cgroups.
From the cgroups manual page:
Control groups, usually referred to as cgroups, are a Linux kernel feature which allow processes to be organized into hierarchical groups whose usage of various types of resources can then be limited and monitored.
cgroups are managed with special commands that start with “cg” but can also be managed through a special cgroups file system and systemd
.
Now a running PostgreSQL cluster is a group of processes, so that's a perfect fit.
There are several subsystems defined (also called “controllers” in cgroups terminology). Of these, the following are interesting for PostgreSQL:
During system startup, cgroups are created as defined in the /etc/cgconfig.conf
configuration file.
Let's create a cgroup to build a cage for a PostgreSQL cluster:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
group db_cage { # user and group 'postgres' can manage these cgroups perm { task { uid = postgres; gid = postgres; fperm = 774; } admin { uid = postgres; gid = postgres; dperm = 775; fperm = 774; } } # limit memory to 1 GB and disable swap memory { memory.limit_in_bytes = 1G; memory.memsw.limit_in_bytes = 1G; } # limit read and write I/O to 10MB/s each on device 8:0 blkio { blkio.throttle.read_bps_device = '8:0 10485760'; blkio.throttle.write_bps_device = '8:0 10485760'; } # limit CPU time to 0.25 seconds out of each second cpu { cpu.cfs_period_us = 1000000; cpu.cfs_quota_us = 250000; } # only CPUs 0-3 and memory node 0 can be used cpuset { cpuset.cpus = 0-3; cpuset.mems = 0; } } |
To activate it, run the following as root:
1 |
# /usr/sbin/cgconfigparser -l /etc/cgconfig.conf -s 1664 |
To have that done automatically at server start, I tell systemd
to enable the cgconfig
service:
1 2 |
# systemctl enable cgconfig # systemctl start cgconfig |
To start PostgreSQL in the cgroups we defined above, use the cgexec
executable (you may have to install an operating system package called libcgroup
or libcgroup-tools
for that):
1 2 |
$ cgexec -g cpu,memory,blkio:db_cage /usr/pgsql-10/bin/pg_ctl -D /var/lib/pgsql/10/data start |
We can confirm that PostgreSQL is running in the correct cgroup:
1 2 3 4 5 6 7 |
$ head -1 /var/lib/pgsql/10/data/postmaster.pid 16284 $ cat /proc/16284/cgroup | egrep 'b(cpu|blkio|memory)b' 10:cpu,cpuacct:/db_cage 9:blkio:/db_cage 4:memory:/db_cage |
To change a running process to a cgroup, you can use cgclassify
(but then you have to change all running PostgreSQL processes).
systemd
systemd
provides a simpler interface to Linux cgroups, so you don't have to do any of the above. systemd
can create cgroups “on the fly” for the services it starts.
If your PostgreSQL service is called postgresql-10
, simply create a file /etc/systemd/system/postgresql-10.service
like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# include the original service file rather than editing it # so that changes don't get lost during an upgrade .include /usr/lib/systemd/system/postgresql-10.service [Service] # limit memory to 1GB # sets 'memory.limit_in_bytes' MemoryMax=1G # limit memory + swap space to 1GB # this should set 'memory.memsw.limit_in_bytes' but it only # works with cgroups v2 ... # MemorySwapMax=1G # limit read I/O on block device 8:0 to 10MB per second # sets 'blkio.throttle.read_bps_device' IOReadBandwidthMax=/dev/block/8:0 10M # limit write I/O on block device 8:0 to 10MB per second # sets 'blkio.throttle.write_bps_device' IOWriteBandwidthMax=/dev/block/8:0 10M # limit CPU time to a quarter of the available # sets 'cpu.cfs_quota_us' CPUQuota=25% # there are no settings to control 'cpuset' cgroups |
Now you have to tell systemd
that you changed the configuration and restart the service:
1 2 |
# systemctl daemon-reload # systemctl restart postgresql-10 |
As you see, not all cgroup settings are available with systemd
. As a workaround, you can define cgroups in /etc/cgconfig.conf
and use cgexec
to start the service.
I would say that it depends on the subsystem.
memory
At first glance, it sounds interesting to limit memory usage with cgroups. But there are several drawbacks:
None of this is very appealing — there is no option to make malloc
fail so that PostgreSQL can handle the problem.
I think that it is better to use the traditional way of limiting PostgreSQL's memory footprint by setting shared_buffers
, work_mem
and max_connections
so that PostgreSQL won't use too much memory.
That also has the advantage that all PostgreSQL clusters on the machine can share the file system cache, so that clusters that need it can get more of that resource, while no cluster can become completely memory starved (everybody is guaranteed shared_buffers
).
I think that cgroups are a very useful way of limiting I/O bandwidth for PostgreSQL.
The only drawback is maybe that PostgreSQL cannot use more than its allotted quota even if the I/O system is idle.
cgroups are also a good way of limiting CPU usage by a PostgreSQL cluster.
Again, it would be nice if PostgreSQL were allowed to exceed its quota if the CPUs are idle.
This is only useful on big machines with a NUMA architecture. On such machines, binding PostgreSQL to the CPUs and memory of one NUMA node will make sure that all memory access is local to that node and consequently fast.
You can thus partition your NUMA machine between several PostgreSQL clusters.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply