CYBERTEC PostgreSQL Logo

View Access Logging - Fix it in PostgreSQL

07.2023 / Category: / Tags: |

By Kirk Wolak

Continuous Improvement is an important part of reducing technical debt. Over 30 years our active database has collected some technical debt. We wanted to rename all of our views to be consistently named. Besides that, we had a lack of code reviews for what is happening in the database. The latter situation has been rectified, and we now have strictly codified rules on object naming, style, etc.

The problem

We have over 600 tables, and 300 views, many having 2-3 different naming conventions applied to them in our database. We want to clean this up. But we cannot tell if these views are even in use today. Some views are run by managers via scripts or items slightly outside of our control. Others are accessed systematically in operations where the names of the views are often build up from pieces (e.g. vw_SYSTEM_top_{100 | 50 | 10} because we have multiple systems that do similar things against various tables and require consistent reporting. So the user can choose options, which generates the view names/dashboards.)

View access logging - the double dilemma

We simply cannot afford to rename all of our views, and hope for the best. So, we have 2 dilemmas. First, we want to rename all of our views to be consistent. PG makes this pretty easy. But we don't want to break anything. Also, we need to track who is potentially using the old view, and how often it is being used, maybe when it is being used.

There are a few challenges here. How do we inject the tracking code we need, but only do it when the old views are used? Furthermore, how do we avoid huge penalties?

Our solution

The following solution is what we chose to implement. It has a single big drawback that it changes previously read-only transactions into read-write transactions. It also does some extra logging, and if you want to know the call stack, that is even more detail/work done with the view. Finally, we want this to work even if the view returns 0 rows of data.

The approach we took to repair our view access logging was:

  1. Rename all views to the new standards
  2. Create "Synonym Views" with the old names that log the hit, and call the new name
  3. Modify our many code bases, and publish these changes as close together as possible
  4. Monitor the logging table, and find/fix whatever we missed, and clear the logging table
  5. Allow for storing the STACK details to determine the actual internal code calling it, but keep it turned off until we are hunting for hard to find code
  6. Only when this stabilizes in "Staging" will we move it to production
  7. At some point in the future, drop the "Synonym Views" and the logging table/function
  8. Via Code Reviews make sure we don't access the old names in new code!

The key takeaway should be that this will take weeks to months. Our view is that you must make at least 2 quarters to have high confidence that nothing is using something, and you must exceed 1 year to be certain. Because like many companies, we have routines that run on those schedules.

The key magic... "Synonym Views" (with the old names)

Each of these views has the exact same structure:

Effectively, this has the effect of materializing the single row qry.
It only returns the columns from the new view. Any filters or sorts applied to this view are translated onto that view naturally.

But now we have a function that is called. Let's see what we can do.
The first iteration of this function was simply:

That sequence was important to validating that regardless of the number of rows, we only logged one item. Disaster would ensue if we had a view with a million rows and we tried logging every row!

Now with the concept tested and proven, we go to the next level

We create a table and log what happens. Here is the table we need in order to log what happens:

And the function:

That's it.

But how do we implement all of this?

Well, we will generate the DDL required to make this work straight from the DB. Then we have a choice to edit this or run it directly. Since it involves naming objects, we know we will have to review the DDL and edit it to make things the best we can. See Laurenz Albe's article on querying for views here. Because that portion would be too specific for this blog.

The simplicity of this solution to repair our view access logging is what makes me appreciate doing things in PG. At the same time, it is strange to some of us that SELECTs can have side effects in the database. This is clearly an example where one of those side effects works to your advantage!

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram