Core development for absolute beginners.
Table of Contents
Prerequisites: basic C, Git, GNU Make and SQL knowledge, feeling comfortable working within the Linux terminal.
This article is a proper aggregation of the information already available at the time of writing. My goal is not to duplicate information, but to lead you through it step by step. It is important to do it as you read it.
First, learn to compile and install PostgreSQL.
Go to git.postgresql.org, find postgresql.git
and git clone
that repository. A build system is used to compile PostgreSQL, the build system is responsible for finding the dependencies/libraries on disk and enabling/disabling features. PostgreSQL versions 16
and newer use the Meson build system, 15
and older rely on GNU Autoconf.
Different build systems require different commands to compile the source code. A simple online search would give this page showing you how to compile PostgreSQL. Here is a quick Meson and Autoconf command reference.
You will have to gather stack traces a lot, so you must compile PostgreSQL with -Og -g
flags to instruct C compiler to add debug symbols and not to optimize the code beyond sufficient: meson setup -debug=true -Dbuildtype=debugoptimized ...
. These are the defaults, but it is nice to be aware of them.
To be able to work with multiple major versions simultaneously, create a git worktree
per each major version. This allows one to work with multiple branches at the same time. With worktrees there is no need to commit/stash changes, resolve conflicts, or lose build artifacts to switch branches.
Visit all the URLs, make sure you understand the commands you have to enter. Compile and install all the currently supported major PostgreSQL versions, satisfy the dependencies if needed. Install under $HOME
, not /usr
, to avoid sudo
. Have a quick walk throughout the source tree, cd
to the root folder of the repository and observe tree -d -L 5 | less
output. Have a look at git tag
and git branch --remote
. Use online search to figure out how to run unit tests, and where to find the entire compilation output.
Read about PostgreSQL versioning policy. You will notice they support five major versions at a time.
This is what the folder structure looks like when you install multiple PostgreSQL major versions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
├── 13 │ ├── bin │ ├── include │ ├── lib │ └── share ├── 15 │ ├── bin │ ├── include │ ├── lib │ └── share └── 16 ├── bin ├── include ├── lib64 └── share |
To be able to execute PostgreSQL binaries by name, you have to add the appropriate bin
folder to your $PATH
. For the development purposes it is also necessary to switch bin
directories quickly, without doing export PATH="$PATH:/whatever/path
. Here is a shell function to serve that particular purpose:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# Choose PostgreSQL bin directory and append it to your $PATH. pgenv() { # Major version _pg_ver="$1" # Full PostgreSQL `bin` directory path. _pg_path="$HOME/.local/usr/psql/$_pg_ver/bin" # Freak out if such a directory does not exist. if [ ! -d "$_pg_path" ]; then echo "$_pg_path does not exist" return 1 fi # Remove all paths from $PATH that have `psql` directory in them. _clean_path="$(echo -n "$PATH" | tr ':' '\n' | grep -v psql | tr '\n' ':')" # Export new `bin` directory. export PATH="${_pg_path}:${_clean_path}" } # Call immediately to set a default version. pgenv 16 |
Append the above function to your ~/.bashrc
or ~/.zshrc
, and reload your shell config. If you want to work with PostgreSQL 17, run pgenv 17
in your terminal window. To change the version, just call it again with a different number. To verify it works, invoke which pg_config
to see which one is going to be executed.
pg_ctl -D /path/to/cluster/dir start
is usually used to start PostgreSQL. To run the server in the foreground and see the logs, start the server binary directly postgres -D /path/to/cluster/dir
. Specifying -D
gets annoying, as is telling psql
what database to connect to. Define PGDATA
, PGDATABASE
, and the other environment variables in order to bypass that. I have the following attached to pgenv
function body:
1 2 3 4 5 6 |
# Set environment for PostgreSQL binaries. export PGHOST="localhost" export PGPORT="5432" export PGDATA="$HOME/.local/share/psql/$_pg_ver/main" export PGUSER="$USER" export PGDATABASE="postgres" |
Now it is postgres
instead of postgres -D /path/to/cluster
, and psql
instead of psql database_name
.
Before you start contributing code to PostgreSQL, it is easier to begin with extension development in C. Let us create an extension to define a function that returns a string:
1 2 3 4 5 6 7 |
postgres=# select postgres_func(); postgres_func -------------------------- Hello from my extension! (1 row) postgres=# |
To create such an extension we need to write:
Makefile
.Save as my_extension.c
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
/* Supplied with PostgreSQL, contain the following macro definitions. */ #include "postgres.h" #include "fmgr.h" /* Linker magic, each PostgreSQL extension must have it. */ PG_MODULE_MAGIC; /* C function prototypes. */ static const char *c_func(void); Datum postgres_func(PG_FUNCTION_ARGS); /* Internal C function, declare them as usual. */ const char * c_func(void) { const char *msg = "Hello from my extension!"; return msg; } PG_FUNCTION_INFO_V1(postgres_func); /* Functions defined like the one below can be called from PostgreSQL shell * and must be preceded by the above magic linker call. */ Datum postgres_func(PG_FUNCTION_ARGS) { /* Yes, I could have supplied the string literal * directly to that macro call. */ PG_RETURN_CSTRING(c_func()); } |
Save as my_extension--1.0.sql
:
1 2 3 4 5 6 7 8 9 |
-- It is a dummy SQL function that invokes our real C function -- with that same name. CREATE FUNCTION postgres_func() RETURNS cstring -- data type of the return value. AS 'MODULE_PATHNAME' -- expands to `module_pathname` value in control file. LANGUAGE C; -- You can insert arbitraty SQL code in here, -- views, table definitions, whatever. |
Save as my_extension.control
:
1 2 3 4 5 6 |
comment = 'Briefly describe your extension here' # This number corresponds to the above SQL file name. default_version = '1.0' # `$libdir` expands to `lib` directory path # under your PostgreSQL install (--prefix) directory. module_pathname = '$libdir/my_extension' |
Here are all the possible values.
Run pg_config
binary, and observe the output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
$ pg_config BINDIR = /home/username/.local/usr/psql/16/bin DOCDIR = /home/username/.local/usr/psql/16/share/doc/postgresql HTMLDIR = /home/username/.local/usr/psql/16/share/doc/postgresql/html INCLUDEDIR = /home/username/.local/usr/psql/16/include PKGINCLUDEDIR = /home/username/.local/usr/psql/16/include/postgresql INCLUDEDIR-SERVER = /home/username/.local/usr/psql/16/include/postgresql/server LIBDIR = /home/username/.local/usr/psql/16/lib64 PKGLIBDIR = /home/username/.local/usr/psql/16/lib64/postgresql LOCALEDIR = /home/username/.local/usr/psql/16/share/locale MANDIR = /home/username/.local/usr/psql/16/share/man SHAREDIR = /home/username/.local/usr/psql/16/share/postgresql SYSCONFDIR = /home/username/.local/usr/psql/16/etc/postgresql PGXS = /home/username/.local/usr/psql/16/lib64/postgresql/pgxs/src/makefiles/pgxs.mk CONFIGURE = CC = ccache cc CPPFLAGS = -D_GNU_SOURCE CFLAGS = -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation CFLAGS_SL = -fPIC LDFLAGS = -Wl,--as-needed LDFLAGS_EX = LDFLAGS_SL = LIBS = VERSION = PostgreSQL 16.1 |
As you can see, it knows all about our PostgreSQL installation. You can ask pg_config
to print out only the information you want:
1 2 3 |
$ pg_config --libdir --bindir /home/username/.local/usr/psql/16/lib64 /home/username/.local/usr/psql/16/bin |
Notice --libdir
value is what $libdir
in control file will expand to. Have a look at --pgxs
value, it is a path to a Makefile
. Open it and read the first few lines:
1 2 3 |
# This file contains generic rules to build many kinds of simple # extension modules. You only need to set a few variables and include # this file, the rest will be done here. |
The above tells us we do not have to write recipes in order to compile a PostgreSQL extension. Instead, we have to define some variables and include this file.
Look further down to see how to include this file:
1 2 3 4 5 6 7 8 9 |
# Use the following layout for your Makefile: # # [variable assignments, see below] # # PG_CONFIG = pg_config # PGXS := $(shell $(PG_CONFIG) --pgxs) # include $(PGXS) # # [custom rules, rarely necessary] |
Basically, the above asks pg_config
where to find --pgxs
file and includes it. PG_CONFIG
variable is defined to be able to supply custom path to pg_config
binary.
Look further down to see the variables you can define within the Makefile
you are going to write:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
# Set one of these three variables to specify what is built: # # MODULES -- list of shared-library objects to be built from source files # with same stem (do not include library suffixes in this list) # MODULE_big -- a shared library to build from multiple source files # (list object files in OBJS) # PROGRAM -- an executable program to build (list object files in OBJS) # # The following variables can also be set: # # EXTENSION -- name of extension (there must be a $EXTENSION.control file) # MODULEDIR -- subdirectory of $PREFIX/share into which DATA and DOCS files # should be installed (if not set, default is "extension" if EXTENSION # is set, or "contrib" if not) # DATA -- random files to install into $PREFIX/share/$MODULEDIR ... |
I will not include the full list, go see for yourself. So, now that we know how to write the Makefile
for our extension, let us actually write it down:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# `my_extension.control` must exist. EXTENSION = my_extension # We want a single shared library. MODULE_big = my_extension # A single shared library from the following object files. # As we have a single source file, we have a single object file. OBJS = my_extension.o # Files to install along with the extension. DATA = my_extension--1.0.sql PG_CONFIG = pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) include $(PGXS) |
Save all the four files, invoke make
to build the extension, make install
to install it. (Re)start the server, connect with psql
to it, then invoke CREATE EXTENSION my_extension;
. Now you can call the function from the extension with SELECT postgres_func();
At this point you should look around, investigate, make sure you understand everything you have just read. Most importantly, try to break things, observe the errors, then fix them. Look at other extensions to see what else you can do, try to install and use them, notice the differences in the process, some of them require more than CREATE EXTENSION ...;
. Try to break other extensions, see what happens if you do not follow all the steps outlined in the installation requirements. Download this extension template I created and study it, it has things I will not mention in this article.
There is no outlined procedure on how to do it, and there is no precise definition of prerequisites either. There can only be a an everlasting list of advice to follow.
A good way to get your feet wet is to participate in reviewing code during commit fests.
Read these PostgreSQL Wiki pages:
First thing you absolutely have to do is to read this book, it is the greatest overview of PostgreSQL internals out there. The book itself is brief, feels more like a series of articles, and to the point.
Read Git commit messages that introduce the code snippets you are working with, it is usually the best source of information.
PostgreSQL Doxygen is pointless.
Third source of useful information are README
files scattered across the source tree, use find
to find them:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
$ find /path/to/postgresql/repository -iname readme ./src/backend/utils/mb/README ./src/backend/utils/misc/README ./src/backend/access/gist/README ./src/backend/access/brin/README ./src/backend/access/rmgrdesc/README ./src/backend/access/transam/README ./src/backend/access/hash/README ./src/backend/access/gin/README ./src/backend/access/spgist/README ./src/backend/access/nbtree/README ./src/backend/regex/README ./src/backend/snowball/README ./src/backend/optimizer/README ./src/backend/optimizer/plan/README ... |
Your effort is all good, but it is crucial to reach out to other people for help. Join mailing lists, attend conferences, connect with other developers using whatever means. I would judge PostgreSQL community as civilized, I have never seen hostility, so do not hesitate asking.
The following is not a strict list, it is legal to adjust it to your preferences/environment.
Working with PostgreSQL source code, you would have to navigate a large Git repository. You may find git bisect
useful for finding a particular commit that introduced a given code snippet, though it may not always work. git log -S'code_line' /path/to/file
sometimes also helps to identify commits. git blame
will help you identify the latest commit to edit a particular line. git cherry-pick
will help you to re-apply a series of commits (a patch) to a different branch. I like cherry-pick because it never fails, it wants you to manually resolve conflicts on the way rather than abort the whole thing.
Have a look at a number of random PostgreSQL project function bodies:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
Datum hashhandler(PG_FUNCTION_ARGS) { /* ... */ } IndexBuildResult * hashbuild(Relation heap, Relation index, IndexInfo *indexInfo) { /* ... */ } void hashbuildempty(Relation index) { /* ... */ } |
Notice the code style, each function name is at the beginning of the line and is followed by (
character. It means you can find any function's body by name across the entire source tree:
1 2 |
$ grep -rni --include='*.c' ^hashbuild\( src/backend/access/hash/hash.c:115:hashbuild(Relation heap, Relation index, IndexInfo *indexInfo) |
grep
tells us the file path and the line number, and it works with any function. As Neovim allows specifying the line number to jump to, I can reach that function directly with nvim +115 src/backend/access/bash/bash.c
. In fact, this can even be scripted:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
#!/bin/bash # Find a given function witin a source tree and open the file at the exact # location in $EDITOR. # Terminate the script execution on any non-zero exit code. set -e _func_name='' # There are function names with underscores, like `init_custom_variable(...)`. # I would rather type `init custom variable` than `init_custom_variable`, and # then let the script substitute all space characters with underscores. # # Join arguments with '_'. for _arg in "$@" do _func_name="${_func_name}_${_arg}" done # Remove the leading '_'. _func_name="${_func_name#_}" # Find that function. _location=$(grep -rni --include='*.c' "^$_func_name(" | head -n 1) # Parse its location and line number from grep output. IFS=':' read -r -a _lparts <<< "$_location" # Open the file. $EDITOR +"${_lparts[1]}" "${_lparts[0]}" |
There is a function named init_custom_variable(...)
somewhere within the source tree. To immediately open its body in my text editor I just have to type ffunc init custom variable
. ffunc
script will grep
the function, parse its location and open it in my Neovim.
Code comments atop function definitions are extremely helpful, which is why you must configure your code editor to be able to quickly jump to them:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
/* * execCurrentOf * * Given a CURRENT OF expression and the OID of a table, determine which row * of the table is currently being scanned by the cursor named by CURRENT OF, * and return the row's TID into *current_tid. * * Returns true if a row was identified. Returns false if the cursor is valid * for the table but is not currently scanning a row of the table (this is a * legal situation in inheritance cases). Raises error if cursor is not a * valid updatable scan of the specified table. */ bool execCurrentOf(CurrentOfExpr *cexpr, ExprContext *econtext, Oid table_oid, ItemPointer current_tid) { |
Meson compilations of PostgreSQL yield compile_commands.json
, take a peek at it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
[ { "directory": "/home/uninstance/dev/c/rel16/build", "command": "ccache cc -Isrc/port/libpgport_srv_crc.a.p -Isrc/include -I../src/include -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O2 -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation -fPIC -DBUILDING_DLL -msse4.2 -MD -MQ src/port/libpgport_srv_crc.a.p/pg_crc32c_sse42.c.o -MF src/port/libpgport_srv_crc.a.p/pg_crc32c_sse42.c.o.d -o src/port/libpgport_srv_crc.a.p/pg_crc32c_sse42.c.o -c ../src/port/pg_crc32c_sse42.c", "file": "../src/port/pg_crc32c_sse42.c", "output": "src/port/libpgport_srv_crc.a.p/pg_crc32c_sse42.c.o" }, { "directory": "/home/uninstance/dev/c/rel16/build", "command": "ccache cc -Isrc/port/libpgport_srv.a.p -Isrc/include -I../src/include -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -O2 -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation -fPIC -DBUILDING_DLL -MD -MQ src/port/libpgport_srv.a.p/bsearch_arg.c.o -MF src/port/libpgport_srv.a.p/bsearch_arg.c.o.d -o src/port/libpgport_srv.a.p/bsearch_arg.c.o -c ../src/port/bsearch_arg.c", "file": "../src/port/bsearch_arg.c", "output": "src/port/libpgport_srv.a.p/bsearch_arg.c.o" }, { "directory": "/home/uninstance/dev/c/rel16/build", |
It is literally the entire sequence of compilation commands, it will help our code editor to natigate the source tree. Miscrosoft released Language Server Protocol specification, it is a standard for code completion/navigation. For any text editor to provide code completion and advanced features like code actions, or jumping to definitions in any language, it only has to speak LSP. Text editors support LSP either natively or via plugins. Neovim, for instance, supports LSP natively. Clang toolchain provides clangd, the language server for C and C++. This particular language server requires the aforementioned compile_commands.json
to work.
In summary, for code completion and navigation you need:
compile_commands.json
, generated by Meson.compile_commands.json
gets generated by Meson for PostgreSQL builds, but it does not work with extensions at the moment. And, obviously, it will not work with PostgreSQL versions 15 and older, as they do not use Meson. Bear can generate compile commands for arbitrary projects, including PostgreSQL extensions; simply start the build like that: bear -- make
.
Here is LSP in action, it allows me to see all the references of portal
variable:
Tree-sitter is super helpful in navigating the code. It is a generic language parser and syntax tree generator. It integrates well with text/code editors. Among many things it provides a proper syntax highlight based on syntax tree (not regular expressions) and allows navigating the project symbols:
Now you are familiar with the basics of PostgreSQL development. Being a good database administrator is not necessary for being a good database developer. Also, there are no people who know all about PostgreSQL, neither developers, nor administrators. Developers in particular, quite often, specialize in certain areas of PostgreSQL.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply