Most people who use PostgreSQL on a regular basis may have seen pg_stat_activity
, which is a system view. It gives you real-time information about what is happening on your database server. It has grown over the years to provide ever more information about system processes, database connections, parallelism and a lot more.
Table of Contents
However, if you are running a Citus deployment, pg_stat_activity
doesn't provide enough information to understand what's going on. While pg_stat_activity
is the perfect tool to inspect a single server, a Citus deployment usually consists of many servers. In that case, it's hard to see what all the shards are doing, and how a query is distributed.
citus_stat_activity
: Monitoring CitusThe solution to the problem is the citus_stat_activity
view. When Citus is enabled, this view is available by default and provides the counterpart to pg_stat_activity
on the cluster level. The following listing shows what the view looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
postgres=# d citus_stat_activity View 'pg_catalog.citus_stat_activity' Column | Type | Collation | Nullable | Default ------------------+--------------------------+-----------+----------+--------- global_pid | bigint | | | nodeid | integer | | | is_worker_query | boolean | | | datid | oid | | | datname | name | | | pid | integer | | | leader_pid | integer | | | usesysid | oid | | | usename | name | | | application_name | text | | | client_addr | inet | | | client_hostname | text | | | client_port | integer | | | backend_start | timestamp with time zone | | | xact_start | timestamp with time zone | | | query_start | timestamp with time zone | | | state_change | timestamp with time zone | | | wait_event_type | text | | | wait_event | text | | | state | text | | | backend_xid | xid | | | backend_xmin | xid | | | query_id | bigint | | | query | text | | | backend_type | text | | | |
What you see is, to a large extent, identical to pg_stat_activity
. However, there are a couple of fields which are only available in the Citus view (such as nodeid
). Those fields will help us to see which shard is doing what, and how.
To demonstrate how citus_stat_statements
works, we can run a slow query. The problem with “normal” queries is that Citus usually executes them so quickly that it's hard to inspect. Therefore I have implemented an extremely slow query which will never end:
1 2 3 |
SELECT count(*) FROM t_col_timeseries WHERE pg_sleep(id) IS NULL; |
If we run this on the coordinator, we have enough time to check what's really going on:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
SELECT global_pid, nodeid, application_name, query FROM citus_stat_activity WHERE state = 'active'; -[ RECORD 1 ]----+------------------------------------------------------------------------------------------------------------------------------- global_pid | 10000085476 nodeid | 1 application_name | citus_run_command gpid=10000085476 query | + | SELECT coalesce(to_jsonb(array_agg(csa_from_one_node.*)), '[{}]'::JSONB) + | FROM ( + | SELECT global_pid, worker_query AS is_worker_query, pg_stat_activity.* FROM + | pg_stat_activity LEFT JOIN get_all_active_transactions() ON process_id = pid + | ) AS csa_from_one_node; + | -[ RECORD 2 ]----+------------------------------------------------------------------------------------------------------------------------------- global_pid | 10000085476 nodeid | 1 application_name | psql query | SELECT global_pid, nodeid, application_name, query FROM citus_stat_activity WHERE state = 'active'; -[ RECORD 3 ]----+------------------------------------------------------------------------------------------------------------------------------- global_pid | 10000048630 nodeid | 1 application_name | psql query | SELECT count(*) FROM t_col_timeseries WHERE pg_sleep(id) IS NULL; -[ RECORD 4 ]----+------------------------------------------------------------------------------------------------------------------------------- global_pid | 10000048630 nodeid | 2 application_name | citus_internal gpid=10000048630 query | SELECT count(*) AS count FROM public.t_col_timeseries_102224 t_col_timeseries WHERE (pg_sleep((id)::double precision) IS NULL) -[ RECORD 5 ]----+------------------------------------------------------------------------------------------------------------------------------- global_pid | 10000048630 nodeid | 2 application_name | citus_internal gpid=10000048630 query | SELECT count(*) AS count FROM public.t_col_timeseries_102220 t_col_timeseries WHERE (pg_sleep((id)::double precision) IS NULL) |
From line 3 onwards, we can see what's happening on the shards. We have multiple queries per nodeid
which operate on tables containing a postfix such as _102220
. We can clearly see the parallelism that is going on behind the scenes.
Please note that if you connect to the shard, a lot of the complexity will be hidden from the end user:
1 2 3 4 5 6 7 |
postgres=# d public.t_col_timeseries_102220 Did not find any relation named 'public.t_col_timeseries_102220'. postgres=# SELECT count(*) FROM public.t_col_timeseries_102220; count --------- 4928109 (1 row) |
psql
won’t necessarily show how things are structured. While d
will tell us that the shard is not present, we can still query it. Many of you will have noticed similar behavior. citus_stat_activity
does not show up in d
either, due to some behind-the-scenes magic.
What I like about citus_stat_activity
is that I can get all the information I need at a glance. It's not necessary to connect to each shard / node separately and gather the information one by one. A single view will show you the information at once, which makes it super easy to debug and to understand how the query is actually distributed efficiently.
See our other Citus and sharding blogs:
In case you need any assistance, please feel free to contact us.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply