kill -9 explained for PostgreSQL

11.2023 / Category: How To / Tags: administration | under the hood

Do you want to kill a database connection? Or maybe you want to use kill -9? On your PostgreSQL database server? Well, there are things you should be aware of before you do that. The question we want to answer in this blog therefore is: How do I kill PostgreSQL processes properly?

Table of Contents

Killing PostgreSQL processes

Before we discuss process killing, it makes sense to first take a look at a simple PostgreSQL instance / cluster on the operating system level:

1200474 ?        Ss     0:00 /usr/pgsql-15/bin/postgres -D sample_db
1200475 ?        Ss     0:00  _ postgres: logger 
1200476 ?        Ss     0:00  _ postgres: checkpointer 
1200477 ?        Ss     0:00  _ postgres: background writer 
1200479 ?        Ss     0:00  _ postgres: walwriter 
1200480 ?        Ss     0:00  _ postgres: autovacuum launcher 
1200481 ?        Ss     0:00  _ postgres: logical replication launcher

1

2

3

4

5

6

7

1200474 ? Ss 0:00 /usr/pgsql-15/bin/postgres -D sample_db

1200475 ? Ss 0:00 _ postgres: logger

1200476 ? Ss 0:00 _ postgres: checkpointer

1200477 ? Ss 0:00 _ postgres: background writer

1200479 ? Ss 0:00 _ postgres: walwriter

1200480 ? Ss 0:00 _ postgres: autovacuum launcher

1200481 ? Ss 0:00 _ postgres: logical replication launcher

What we see here is the postmaster process and its children which together form the PostgreSQL core system infrastructure. Most processes in this infrastructure can be killed nicely without any problems:
[sql gutter="false"]
[hs@sample ~]$ kill 1200480

What we have just killed is the autovacuum launcher. That's not a problem at all, because the postmaster will instantly launch a new one: we can judge by the new process ID, which can be seen in the process table.

PostgreSQL will basically ensure that this thing is always running:

1200474 ?        Ss     0:00 /usr/pgsql-15/bin/postgres -D sample_db
1200475 ?        Ss     0:00  _ postgres: logger 
1200476 ?        Ss     0:00  _ postgres: checkpointer 
1200477 ?        Ss     0:00  _ postgres: background writer 
1200479 ?        Ss     0:00  _ postgres: walwriter 
1200481 ?        Ss     0:00  _ postgres: logical replication launcher 
1200524 ?        Ss     0:00  _ postgres: autovacuum launcher

1

2

3

4

5

6

7

1200474 ? Ss 0:00 /usr/pgsql-15/bin/postgres -D sample_db

1200475 ? Ss 0:00 _ postgres: logger

1200476 ? Ss 0:00 _ postgres: checkpointer

1200477 ? Ss 0:00 _ postgres: background writer

1200479 ? Ss 0:00 _ postgres: walwriter

1200481 ? Ss 0:00 _ postgres: logical replication launcher

1200524 ? Ss 0:00 _ postgres: autovacuum launcher

To understand this better, we have to understand what kill does. It does not actually kill anything, it sends a signal to an operating system process. The name "kill" is still appropriate, since any signal that the target process does not explicitly handle will lead to its immediate termination.

SIGTERM signals vs. kill -9

Keep in mind that we have used a "normal" kill here (which sends a SIGTERM signal). However, not all kills are equal. A "normal" kill can be caught and handled by the application. What has basically happened is that the autovacuum launcher process has caught the SIGTERM signal and terminated itself properly. "Properly" means that all shared resources are left in a consistent, clean state and that no leftovers could make their way through the system to cause issues at some random place. However, a normal kill is not what kill -9 does. The problem is: "-9" sends the SIGKILL signal which the target process cannot catch and handle; it is simply brutally destroyed by the operating system, potentially leaving shared memory and other resources in an inconsistent state. This is an extremely important difference which must be kept in mind when killing processes.

Killing a database connection the “`-9`” way

But what happens if we brutally kill a database connection? The last process is a standard connection to PostgreSQL. User “hs” is connected to the postgres database via a local IPv6 connection. The following listing has the details:

1200474 ?        Ss     0:00 /usr/pgsql-15/bin/postgres -D sample_db
1200475 ?        Ss     0:00  _ postgres: logger 
1200476 ?        Ss     0:00  _ postgres: checkpointer 
1200477 ?        Ss     0:00  _ postgres: background writer 
1200479 ?        Ss     0:00  _ postgres: walwriter 
1200481 ?        Ss     0:00  _ postgres: logical replication launcher 
1200524 ?        Ss     0:00  _ postgres: autovacuum launcher 
1200623 ?        Ss     0:00  _ postgres: hs postgres ::1(40498) idle

1

2

3

4

5

6

7

8

1200474 ? Ss 0:00 /usr/pgsql-15/bin/postgres -D sample_db

1200475 ? Ss 0:00 _ postgres: logger

1200476 ? Ss 0:00 _ postgres: checkpointer

1200477 ? Ss 0:00 _ postgres: background writer

1200479 ? Ss 0:00 _ postgres: walwriter

1200481 ? Ss 0:00 _ postgres: logical replication launcher

1200524 ? Ss 0:00 _ postgres: autovacuum launcher

1200623 ? Ss 0:00 _ postgres: hs postgres ::1(40498) idle

Time to kill some processes.

The user connection is brutally eliminated, but look carefully and see what has happened:

[hs@sample ~]$ kill -9 1200623
[hs@sample ~]$ ps axf | grep post
1200474 ?        Ss     0:00 /usr/pgsql-15/bin/postgres -D sample_db
1200475 ?        Ss     0:00  _ postgres: logger 
1200628 ?        Ss     0:00  _ postgres: checkpointer 
1200629 ?        Ss     0:00  _ postgres: background writer 
1200630 ?        Ss     0:00  _ postgres: walwriter 
1200631 ?        Ss     0:00  _ postgres: autovacuum launcher 
1200632 ?        Ss     0:00  _ postgres: logical replication launcher

1

2

3

4

5

6

7

8

9

[hs@sample ~]$ kill -9 1200623

[hs@sample ~]$ ps axf | grep post

1200474 ? Ss 0:00 /usr/pgsql-15/bin/postgres -D sample_db

1200475 ? Ss 0:00 _ postgres: logger

1200628 ? Ss 0:00 _ postgres: checkpointer

1200629 ? Ss 0:00 _ postgres: background writer

1200630 ? Ss 0:00 _ postgres: walwriter

1200631 ? Ss 0:00 _ postgres: autovacuum launcher

1200632 ? Ss 0:00 _ postgres: logical replication launcher

Most processes are gone and have been restarted using a new process ID. Why is that?

Previously, I mentioned that a “-9” sends a non-catchable signal which implies that the process cannot react to the kill. Therefore, the process might terminate at any point, which in turn means that we cannot rely on the fact that it left the database in a consistent state. The process might have left us in any state. Operations might have been terminated during a change in shared memory, or on disk, or anywhere else. That means that we cannot trust the content of the shared memory anymore. Some random blocks somewhere inside shared buffers might be corrupted, or something else might have happened which could lead to potential corruption further down the road.

What does PostgreSQL actually do after a kill -9?

The way PostgreSQL copes with this situation is to immediately drop all connections, clear out memory and initiate WAL recovery from the latest checkpoint. Basically, PostgreSQL does the same thing as it does during a “normal database crash”.

The main advantage is that you might lose a couple of database connections and you won’t face database corruption. However, if you “kill -9” on a busy system, you will potentially eliminate thousands of database connections unintentionally, and that might have unforeseen consequences. Therefore, it's usually not a good idea to run kill -9 in the first place. If you do, it's important to understand what's actually happening behind the scenes and to understand the implications.

kill -9 explained for PostgreSQL

Killing PostgreSQL processes

SIGTERM signals vs. kill -9

Killing a database connection the “`-9`” way

Time to kill some processes.

Most processes are gone and have been restarted using a new process ID. Why is that?

What does PostgreSQL actually do after a kill -9?

Finally …

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

kill -9 explained for PostgreSQL

Killing PostgreSQL processes

SIGTERM signals vs. kill -9

Killing a database connection the “-9” way

Time to kill some processes.

Most processes are gone and have been restarted using a new process ID. Why is that?

What does PostgreSQL actually do after a kill -9?

Finally …

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Killing a database connection the “`-9`” way