UPDATED 14.02.2023: pg_timetable is a brand-new job scheduler for PostgreSQL implemented completely from scratch (in Go). Traditional job scheduling is quite limited and does not cover all we needed in real life. Therefore, we thought about job execution in more detail and came up with a new concept that has not been seen in the PostgreSQL world before. pg_timetable does not simply execute some simple SQL statements periodically. It has been designed to execute entire sequences of operations at a time while offering some other important features we will discuss in this post.
Table of Contents
You can run pg_timetable in 2 ways:
Both methods are described on our Github page so it should be easy for most people to run pg_timetable and to give it a try.
For the sake of simplicity we assume that you have installed our new PostgreSQL job scheduler and everything is up and running. So let's take a look and see how pg_timetable can be configured.
The core idea is to allow users to execute more complicated sequences of operations. Now what does that mean? Suppose you want to …
A sequence like that is quite common. Let us take a look at those steps in pg_timetable terminology: Each step on the way is a “base task”. There are various types of base tasks:
Those tasks can be combined and turned into chains. A base task can take parameters which are added at runtime. What does that mean? Suppose you want to send an email once a day. Who is going to receive this email? This kind of information can be passed to chains / base tasks scheduled for execution.
Let us take a look at an easy pg_timetable configuration examples:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
-- Run public.my_func() at 00:05 every day in August: SELECT timetable.add_job('execute-func', '5 0 * 8 *', 'SELECT public.my_func()'); -- Run VACUUM at minute 23 past every 2nd hour from 0 through 20 every day: SELECT timetable.add_job('run-vacuum', '23 0-20/2 * * *', 'VACUUM'); -- Refresh materialized view every 2 hours: SELECT timetable.add_job('refresh-matview', '@every 2 hours', 'REFRESH MATERIALIZED VIEW public.mat_view'); -- Clear log table after pg_timetable restart: SELECT timetable.add_job('clear-log', '@reboot', 'TRUNCATE public.log'); -- Reindex at midnight on Sundays with reindexdb utility: -- using default database under default user (no command line arguments) SELECT timetable.add_job('reindex-job', '0 0 * * 7', 'reindexdb', job_kind := 'PROGRAM'); -- specifying target database and tables, and be verbose SELECT timetable.add_job('reindex-job', '0 0 * * 7', 'reindexdb', '['--table=foo', '--dbname=postgres', '--verbose']'::jsonb, 'PROGRAM'); -- passing password using environment variable through bash shell SELECT timetable.add_job(job_name := 'reindex-job', job_schedule := '0 0 * * 7', job_command := 'bash', job_max_instances := 1, job_exclusive := true, job_parameters := '['-c', 'PGPASSWORD=5m3R7K4754p4m'] reindexdb -U postgres -h 192.168.0.221 -v'::jsonb, job_kind := 'PROGRAM', job_self_destruct := false, job_ignore_errors := true, job_live := true); |
These are the simplest one-task chain examples, so-called jobs. A job is a chain with only one task according to our terminology.
Detailed description of the add_job() function is available in the manual.
If you want to see more complex multi-step chain examples, please, visit our Samples folder in the repo and appropriate Samples section in the manual.
Concurrency protection is an important aspect. Suppose you want to backup a database daily. What is going to happen if the backup takes 60 hours to complete? At some point, your server is going to die because jobs pile up. Or maybe backup files will start to overwrite each other. In short: If you want to make sure that only one backup at a time is allowed, pg_timetable concurrency protection is what you want. The max_instances = 1
setting is the easiest way to do that. There is no more need for pid files, no need for anything else. pg_timetable does it all for you.
More often than not, it happens that a job should only be done once. The idea is simple: Suppose somebody gets a 7-day free trial. After 7 days, you want to shut down the person's account. Obviously, shutting down somebody's account is not done daily - it happens once, and if the process was successful, that was basically it.
Self-destructive chains are especially useful for asynchronous execution. What do I mean by that? Suppose you have a website and you press a button to launch a job that runs for 2 hours. Clearly, nobody is going to sit in front of the browser window and wait. What you do instead is to tell pg_timetable to execute stuff as soon as possible and kill the job as soon as it is ended successfully. Asynchronous execution is the prime use case.
In case you're wondering how many simultaneous jobs can pg_timetable run in one session, we have two special examples for you: one simulates many parallel chains and another shows how to handle many tasks within a chain.
What is even better, the later shows you how to check the result code and output of a previous task. If the last task failed, that is possible only if ignore_error boolean = true
is set for that task. Otherwise, a scheduler will stop the chain. This sample shows how to calculate failed, successful, and the total number of tasks executed. Based on these values, we can calculate the success ratio:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
... [INFO] [chain:1] [task:497] [txid:499374] Starting task [INFO] [chain:1] [task:497] [txid:499374] Task executed successfully [INFO] [chain:1] [task:498] [txid:499374] Starting task [ERROR] [chain:1] [task:498] [sql:SELECT 1.0 / round(random())::int4;] [args:[]] [err:ERROR: division by zero (SQLSTATE 22012)] [pid:9616] [time:505.3µs] [txid:499374] Query [ERROR] [chain:1] [task:498] [error:ERROR: division by zero (SQLSTATE 22012)] [txid:499374] Task execution failed [INFO] [chain:1] [task:498] [txid:499374] Ignoring task failure [INFO] [chain:1] [task:499] [txid:499374] Starting task [INFO] [chain:1] [task:499] [txid:499374] Task executed successfully [INFO] [chain:1] [task:500] [txid:499374] Starting task [ERROR] [chain:1] [task:500] [sql:SELECT 1.0 / round(random())::int4;] [args:[]] [err:ERROR: division by zero (SQLSTATE 22012)] [pid:9616] [time:2.1592ms] [txid:499374] Query [ERROR] [chain:1] [task:500] [error:ERROR: division by zero (SQLSTATE 22012)] [txid:499374] Task execution failed [INFO] [chain:1] [task:500] [txid:499374] Ignoring task failure [INFO] [chain:1] [task:501] [txid:499374] Starting task [INFO] [chain:1] [task:501] [txid:499374] Task executed successfully [INFO] [chain:1] [task:1] [txid:499374] Starting task [INFO] [notice:Tasks executed:500; succeeded: 274; failed: 226; ratio: 54.8000000000000000] [severity:NOTICE] Notice received [INFO] [chain:1] [task:1] [txid:499374] Task executed successfully [INFO] [chain:1] [txid:499374] Chain executed successfully ... |
To make configuration easier, we have introduced a cron-style configuration interface:
1 2 3 4 5 6 7 8 9 |
-- CRON-Style -- * * * * * command to execute -- ┬ ┬ ┬ ┬ ┬ -- │ │ │ │ │ -- │ │ │ │ └──── day of the week (0 - 7) (Sunday to Saturday)(0 and 7 is Sunday); -- │ │ │ └────── month (1 - 12) -- │ │ └──────── day of the month (1 - 31) -- │ └────────── hour (0 - 23) -- └──────────── minute (0 - 59) |
But what if you want to have enhanced control over jobs? It's not that unusual one cannot foresee the execution time of some complicated operations, like REFRESH MATERIALIZED VIEW
, CREATE INDEX CONCURRENTLY
, etc. It's wise to run the next iteration after the some period, let say in an hour, or a day. In this case pg_timetable provides special interval format:
'@after 2 hours'
will start the next chain iteration in two hours after the finish of the previous run;'@every 2 hours'
will spin a new chain run every two hours no matter if and when the previous run is finished;'@reboot'
value will run the chain at the very beginning of a pg_timetable session. Pay attention: not the server reboot! Rotate logs, install updates, check system status are the good examples of things you might want to do during start-up.If you like pg_timetable, we are looking forward to receive you feedback, or maybe even feature requests, bug reports and contributions. We also want to point out that commercial support for pg_timetable is available internationally.
If you want to read something else right now, we can recommend our post about pg_permissions which is available here.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Fantastic work team.
I really don't understand why you are implementing core system features in Go.
Why is this not done C ?
Cheers
1. Goroutines
2. Cross-platform support
3. Easy parallelism management
Thanks for your kind words!
Very nice, thanks for sharing it.
One question: why didn't you guys use the PGQ Extension? I think it's very mature it could solve this issue as well.
Hello Rafael. Thanks for your comment.
Would you be kind to provide some examples on how PGQ can be used for scheduling? They are completely different, unless we missed something.
Thanks
Hi, I am new to PostgreSQL as I come from a SQL Server environment.
I could not get the Windows installer to work but managed to install pg_timetable using the git clone option.
I do like the tool but have a question regarding the scheduling.
Do I have to have a command prompt open with the scheduling enabled for it to work (one per database?), or is there another option, i.e., backgroup worker, etc.? As I don't like keeping the command prompt open and running on the LIVE environment as it could be easy closed by someone without realising?
No; that would be pretty useless.
You just add the job, then you can close the database connection, and the job will be started all by itself.