Our team is proud to introduce a new pg_timetable v4.4 release!
Table of Contents
This time we focused on implementing a couple of new features, as well as improving performance.
I want to remind you that pg_timetable is a community project. So, please, don’t hesitate to ask any questions, to report bugs, to star the pg_timetable project, and to tell the world about it.
The first new cool feature we've added to pg_timetable v4.4 release is the web-server providing REST API. Right now it only serves two endpoints: /liveness
and /readiness
.
GET /liveness
always returns HTTP status code 200, which only indicates that pg_timetable is running, e.g.
1 2 3 4 |
$ curl -i localhost:8080/liveness HTTP/1.1 200 OK Date: Wed, 09 Feb 2022 15:11:02 GMT Content-Length: 0 |
GET /readiness
returns HTTP status code 200 when pg_timetable is running and the scheduler is in the main loop processing chains, e.g.
1 2 3 4 |
$ curl -i localhost:8080/readiness HTTP/1.1 200 OK Date: Wed, 09 Feb 2022 15:30:25 GMT Content-Length: 0 |
If the scheduler connects to the database, creates a database schema, or upgrades it, it will return HTTP status code 503, i.e.
1 2 3 4 |
$ curl -i localhost:8080/readiness HTTP/1.1 503 Service Unavailable Date: Wed, 09 Feb 2022 15:10:48 GMT Content-Length: 0 |
This is useful for monitoring purposes; for example, to perform HTTP health checks. We are planning to add more endpoints to perform start/stop/reinitialize/restarts/reloads and to provide extended monitoring statistics.
The REST API server is disabled by default. You should use the --rest-port
command-line parameter to activate it:
1 2 3 4 5 6 7 |
$ ./pg_timetable --rest-port=8080 --clientname=loader2 postgresql://scheduler@localhost/timetable 2022-02-09 16:36:08.593 [INFO] [port:8080] Starting REST API server... 2022-02-09 16:36:08.867 [INFO] Database connection established 2022-02-09 16:36:08.875 [INFO] Accepting asynchronous chains execution requests... 2022-02-09 16:36:08.880 [INFO] [count:0] Retrieve scheduled chains to run @reboot 2022-02-09 16:36:08.884 [INFO] [count:2] Retrieve interval chains to run ... |
For debugging and monitoring purposes, we've added detailed version output in pg_timetable v4.4. You should use the -v, --version
command-line argument to force pg_timetable to output the associated version information:
1 2 3 4 5 6 |
$ pg_timetable.exe -v pg_timetable: Version: 4.4.0 DB Schema: 00381 Git Commit: 52e12177d0025b9b01c737cea06048fc350315f5 Built: 2022-02-07T14:06:57Z |
The first line is the version of the binary itself, or the name of the branch if this is a development build. For example, the latest
tag of our cybertecpostgresql/pg_timetable Docker image is always built against the master
branch, thus the output will be slightly different:
1 2 3 4 5 6 |
$ docker run --rm cybertecpostgresql/pg_timetable:latest -v pg_timetable: Version: master DB Schema: 00381 Git Commit: e67c6872ab9aa91a262aab5b75fb76ea51e050b8 Built: 2022-02-07T16:01:25+01:00 |
⚠️ Since the latest tag is up to date with the master
branch, you probably want to use the latest stable tag in production.
The database schema line in the output indicates the version of the latest database migration applied. We use the ID of the Github issue that caused these changes as an identifier. That helps quickly locate the history connected with the schema change, e.g. Issue #381.
Git commit is the commit against which the binary is built, and the precise time is placed on the last line.
It turns out that on highly loaded systems, the scheduler inserts too many rows in the system table run_status
: one row for chain start and one for a finish. Over time, the target table may contain a high number of rows, causing internal functions to lag for about ~2-3 seconds for each call. That also means resource usage can get to be too much.
The whole idea behind run_status
was to track active chains so the scheduler won't run new chains if the active number exceeds max_instances
.
In fact, we don't need such a detailed table, because we already have log
and execution_log
tables where every piece of the chain of execution is already stored.
Also, this run_status
table was designed in a very complicated way, but that allowed it to hold many details. On the other hand, managing active/running chains can be done in a similar way to how we manage active sessions. From the logical point of view, this is the same. So now in the new version, instead of managing this complicated run_status
table, we switched to another active_chain
table. And the idea behind this active_chain
table is the same as the active_session
table that we already use for sessions.
The idea itself can be described in 3 steps:
1. make it UNLOGGED to save space and not produce WALs
2. add a row to the active_chain
table when a chain starts
3. delete a row from the active_chain
table when a chain is finished or failed.
In this way, we can handle a load of several thousand parallel jobs simultaneously without visible degradation -- well, at least in the test environment.
There are some more improvements. The full changelog is available on the v4.4 release page. We want to thank all contributors and users for their help.
If you want to contribute to pg_timetable and help to make it better:
In conclusion, I wish you all the best! ♥️
Please, stay safe – so we can meet in person at one of the conferences, meetups, or training sessions!
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply