View Access Logging - Fix it in PostgreSQL

07.2023 | Category: How To | Tags: administration, view

07.2023

Category: How To

By Kirk Wolak

Table of Contents

Continuous Improvement is an important part of reducing technical debt. Over 30 years our active database has collected some technical debt. We wanted to rename all of our views to be consistently named. Besides that, we had a lack of code reviews for what is happening in the database. The latter situation has been rectified, and we now have strictly codified rules on object naming, style, etc.

The problem

We have over 600 tables, and 300 views, many having 2-3 different naming conventions applied to them in our database. We want to clean this up. But we cannot tell if these views are even in use today. Some views are run by managers via scripts or items slightly outside of our control. Others are accessed systematically in operations where the names of the views are often build up from pieces (e.g. vw_SYSTEM_top_{100 | 50 | 10} because we have multiple systems that do similar things against various tables and require consistent reporting. So the user can choose options, which generates the view names/dashboards.)

View access logging - the double dilemma

We simply cannot afford to rename all of our views, and hope for the best. So, we have 2 dilemmas. First, we want to rename all of our views to be consistent. PG makes this pretty easy. But we don't want to break anything. Also, we need to track who is potentially using the old view, and how often it is being used, maybe when it is being used.

There are a few challenges here. How do we inject the tracking code we need, but only do it when the old views are used? Furthermore, how do we avoid huge penalties?

Our solution

The following solution is what we chose to implement. It has a single big drawback that it changes previously read-only transactions into read-write transactions. It also does some extra logging, and if you want to know the call stack, that is even more detail/work done with the view. Finally, we want this to work even if the view returns 0 rows of data.

The approach we took to repair our view access logging was:

Rename all views to the new standards
Create "Synonym Views" with the old names that log the hit, and call the new name
Modify our many code bases, and publish these changes as close together as possible
Monitor the logging table, and find/fix whatever we missed, and clear the logging table
Allow for storing the STACK details to determine the actual internal code calling it, but keep it turned off until we are hunting for hard to find code
Only when this stabilizes in "Staging" will we move it to production
At some point in the future, drop the "Synonym Views" and the logging table/function
Via Code Reviews make sure we don't access the old names in new code!

The key takeaway should be that this will take weeks to months. Our view is that you must make at least 2 quarters to have high confidence that nothing is using something, and you must exceed 1 year to be certain. Because like many companies, we have routines that run on those schedules.

The key magic... "Synonym Views" (with the old names)

Each of these views has the exact same structure:

CREATE OR REPLACE VIEW old_name AS
    WITH qry AS MATERIALIZED (SELECT view_access_log('old_name') AS logged)
  SELECT vw.* FROM vw_new_name vw, qry WHERE qry.logged > 0;

1

2

3

CREATE OR REPLACE VIEW old_name AS

WITH qry AS MATERIALIZED (SELECT view_access_log('old_name') AS logged)

SELECT vw.* FROM vw_new_name vw, qry WHERE qry.logged > 0;

Effectively, this has the effect of materializing the single row qry.
It only returns the columns from the new view. Any filters or sorts applied to this view are translated onto that view naturally.

But now we have a function that is called. Let's see what we can do.
The first iteration of this function was simply:

CREATE SEQUENCE seq_view_consumer;
CREATE OR REPLACE FUNCTION view_access_log(text) 
RETURNS int 
LANGUAGE PLPGSQL AS
$
BEGIN
 RETURN nextval('seq_view_consumer');
END
$;

1

2

3

4

5

6

7

8

9

CREATE SEQUENCE seq_view_consumer;

CREATE OR REPLACE FUNCTION view_access_log(text)

RETURNS int

LANGUAGE PLPGSQL AS

$

BEGIN

RETURN nextval('seq_view_consumer');

END

$;

That sequence was important to validating that regardless of the number of rows, we only logged one item. Disaster would ensue if we had a view with a million rows and we tried logging every row!

Now with the concept tested and proven, we go to the next level

We create a table and log what happens. Here is the table we need in order to log what happens:

CREATE TABLE view_access_log
(
    id          bigint GENERATED ALWAYS AS IDENTITY
        PRIMARY KEY,
    dtwhen      timestamp WITH TIME ZONE DEFAULT NOW(),
    view_name   text,
    call_stack  text
);

1

2

3

4

5

6

7

8

CREATE TABLE view_access_log

(

id bigint GENERATED ALWAYS AS IDENTITY

PRIMARY KEY,

dtwhen timestamp WITH TIME ZONE DEFAULT NOW(),

view_name text,

call_stack text

);

And the function:

CREATE FUNCTION touch_view(view_str text) RETURNS integer
    LANGUAGE plpgsql
AS
$
DECLARE
    s_stack text;
BEGIN
    -- GET CURRENT DIAGNOSTICS s_stack = PG_CONTEXT;  /* Uncomment this to log the call stack.  
Too intense originally! */

    INSERT INTO test.view_access_log (view_name, call_stack) VALUES(view_str, s_stack);
    RETURN 1;
END
$;

1

2

3

4

5

6

7

8

9

10

11

12

13

14

CREATE FUNCTION touch_view(view_str text) RETURNS integer

LANGUAGE plpgsql

AS

$

DECLARE

s_stack text;

BEGIN

-- GET CURRENT DIAGNOSTICS s_stack = PG_CONTEXT; /* Uncomment this to log the call stack.

Too intense originally! */

INSERT INTO test.view_access_log (view_name, call_stack) VALUES(view_str, s_stack);

RETURN 1;

END

$;

That's it.

But how do we implement all of this?

Well, we will generate the DDL required to make this work straight from the DB. Then we have a choice to edit this or run it directly. Since it involves naming objects, we know we will have to review the DDL and edit it to make things the best we can. See Laurenz Albe's article on querying for views here. Because that portion would be too specific for this blog.

The simplicity of this solution to repair our view access logging is what makes me appreciate doing things in PG. At the same time, it is strange to some of us that SELECTs can have side effects in the database. This is clearly an example where one of those side effects works to your advantage!

View Access Logging - Fix it in PostgreSQL

The problem

View access logging - the double dilemma

Our solution

The approach we took to repair our view access logging was:

The key magic... "Synonym Views" (with the old names)

Now with the concept tested and proven, we go to the next level

But how do we implement all of this?

Leave a Reply Cancel reply

CYBERTEC Guest

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts