Hello, my name is Pavlo Golub, and I am a scheduler addict. That began when I implemented pg_timetable for PostgreSQL. I wrote a lot about it. In this post, I want to share the result of my investigations on the schedulers available for PostgreSQL. I gave a talk about this topic at the CERN meetup, so you may want to check it out for more details.
Table of Contents
Let's start with the complete comparison table. If you want to know more about each aspect, you'll find further explanations below.
I would like to get comments and suggestions on this table, especially from developers or users of these products. I can be biased towards my own creation, so please don't judge me too harshly. 🙂
FeatureProduct | pg_timetable | pg_cron | pgAgent | jpgAgent | pgbucket |
Architecture | |||||
---|---|---|---|---|---|
Year | 2019 | 2016 | 2008 | 2016 | 2015 |
Implementation | standalone | bgworker | standalone | standalone | standalone |
Language | Go | C | C++ | Java | C++ |
Can operate wo extension | ✔️ | ❌ | ❌ | ❌ | ✔️ |
Jobs meta stored in | database | database | database | database | file |
Remote Database Execution | ✔️ | ❌ | ✔️ | ✔️ | ✔️ |
Cross Platform | ✔️ | ✔️ | ✔️ | ✔️ | ❌ |
Functionality | |||||
SQL tasks | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Program/Shell tasks | ✔️ | ❌ | ✔️ | ✔️ | ✔️ |
Built-in tasks | ✔️ | ❌ | ❌ | ❌ | ❌ |
Parallel Jobs | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Parallel Jobs Limit | ✔️ | ❔ | ❔ | ❔ | ✔️ |
Concurrency protection | ✔️ | ✔️ | ❔ | ❔ | ❔ |
Task Parameters | ✔️ | ❌ | ❌ | ❌ | ❌ |
Arbitrary Role | ✔️ | ❌ | ✔️ | ✔️ | ✔️ |
On Success Task | ✔️ | ❌ | ✔️ | ❔ | ✔️ |
On Error Task | ✔️ | ❌ | ❌ | ❔ | ✔️ |
Scheduling | |||||
Standard Cron | ✔️ | ✔️ | ❌ | ❌ | ✔️ |
Interval | ✔️ | ✔️ | ❌ | ❌ | ❌ |
On Reboot | ✔️ | ❌ | ❌ | ❌ | ❌ |
Start Manually | ✔️ | ❌ | ❌ | ❌ | ✔️ |
Kill Running Job | ✔️ | ❌ | ❌ | ✔️ | ✔️ |
Job Timeout | ✔️ | ❌ | ❌ | ✔️ | ❌ |
Task Timeout | ✔️ | ❌ | ❌ | ❌ | ❌ |
Disable Job | ✔️ | ❔ | ✔️ | ✔️ | ✔️ |
Auto Job Disable | ❌ | ❌ | ❌ | ❌ | ✔️ |
Self-Destructive Jobs | ✔️ | ❌ | ❌ | ❌ | ❔ |
Logging Levels | |||||
Job | ✔️ | ❔ | ❔ | ❔ | ❔ |
Task | ✔️ | ❔ | ❔ | ❔ | ❔ |
Session | ✔️ | ❔ | ❔ | ❔ | ❔ |
Logging Destinations | |||||
stdout/stderr | ✔️ | ❔ | ❔ | ❔ | ❔ |
file | ✔️ | ❔ | ❔ | ❔ | ❔ |
database | ✔️ | ❔ | ❔ | ❔ | ❔ |
The first important aspect of each scheduler is its implementation. Either it is a standalone client or a background worker.
A standalone client can be run on any host or platform. The only requirement for such architecture is the ability to connect to the target PostgreSQL server.
On the other hand, background worker implementation requires it to be one of the PostgreSQL processes. That, in turn, means you need to change the shared_preload_libraries
configuration parameter and restart the server. But in that case, the scheduler doesn't need to connect to PostgreSQL (it can, though) and can use a unique SPI protocol.
There is no right or wrong choice; each of these choices has pros and cons. The proper solution depends on the user and the environment.
Each of the schedulers heavily relies on a database infrastructure. Specific purpose tables, functions, and views are a common thing. So it's up to the developer to decide how to organize this set of objects in a database. Database extensions are one of the obvious ways of doing so. The difference reveals itself during upgrades: either the user should run ALTER EXTENSION name UPDATE
for extension-based deployments, or the scheduler is responsible for updating itself.
Right now, only one of the five schedulers can store jobs and task descriptions from the database: pgbucket can store jobs and tasks in files.
Each database scheduler is, at a minimum, supposed to be able to execute SQL tasks. The ability to run external programs or shell commands is a big plus. For some everyday tasks, e.g., send mail, log, copy table, etc., we've implemented them as productive internal tasks in pg_timetable.
An essential aspect of scheduler operation is the capability
When it comes to execution control, I check if:
This is probably the main criteria by which people evaluate schedulers. cron syntax is the standard, de facto.
Both pgAgent and jpgAgent heavily depend on GUI, and it's impossible to use cron syntax as the input value. The user needs to use checkboxes to specify the required schedule.
Interval values specify if jobs can be executed within a time interval, e.g., every hour, every 15 minutes.
On reboot indicates that jobs will be executed after a scheduler restart, not after a PostgreSQL restart. It's almost impossible to handle server restart adequately, unless the scheduler is implemented as a background worker. In that case, a restart of PostgreSQL means restarting all processes, including the scheduling process.
Sometimes it's good to have the opportunity to start a job manually, for example, for debugging purposes or during maintenance windows. The same applies to the ability to stop a frozen or long-running job.
Job and task timeouts allow you to terminate long-running processes automatically.
When using PostgreSQL schedulers, we are interested in:
In this part, we check what, how detailed and where PostgreSQL schedulers can:
I would like to see more PostgreSQL schedulers available! Let me know if you have any suggestions or fixes, and I will update this table with the new info.
I want to remind you that pg_timetable is a community project. So, please, don't hesitate to ask any questions, to report bugs, to star pg_timetable project, and to tell the world about it.
In conclusion, I wish you all the best! ♥️
Please, stay safe - so we can meet in person at one of the conferences, meetups, or training sessions!
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Hi Pavlo,
thanks for the great overview. We currently use Quartz plus Advisory Locks, but in case this changes it is good to have an overview like this, as there is really a lot to choose from.
As you are asking for additions: Even though it's very new, there is also pg_dbms_jobs, that was in included in the PostgreSQL news recently.
Can you also include this in your comparison?
Best regards
Salek Talangi
How about https://github.com/okbob/generic-scheduler and/or https://github.com/RekGRpth/pg_task
I use timescaledb jobs