By Kirk Wolak
Table of Contents
Continuous Improvement is an important part of reducing technical debt. Over 30 years our active database has collected some technical debt. We wanted to rename all of our views to be consistently named. Besides that, we had a lack of code reviews for what is happening in the database. The latter situation has been rectified, and we now have strictly codified rules on object naming, style, etc.
We have over 600 tables, and 300 views, many having 2-3 different naming conventions applied to them in our database. We want to clean this up. But we cannot tell if these views are even in use today. Some views are run by managers via scripts or items slightly outside of our control. Others are accessed systematically in operations where the names of the views are often build up from pieces (e.g. vw_SYSTEM_top_{100 | 50 | 10} because we have multiple systems that do similar things against various tables and require consistent reporting. So the user can choose options, which generates the view names/dashboards.)
We simply cannot afford to rename all of our views, and hope for the best. So, we have 2 dilemmas. First, we want to rename all of our views to be consistent. PG makes this pretty easy. But we don't want to break anything. Also, we need to track who is potentially using the old view, and how often it is being used, maybe when it is being used.
There are a few challenges here. How do we inject the tracking code we need, but only do it when the old views are used? Furthermore, how do we avoid huge penalties?
The following solution is what we chose to implement. It has a single big drawback that it changes previously read-only transactions into read-write transactions. It also does some extra logging, and if you want to know the call stack, that is even more detail/work done with the view. Finally, we want this to work even if the view returns 0 rows of data.
The key takeaway should be that this will take weeks to months. Our view is that you must make at least 2 quarters to have high confidence that nothing is using something, and you must exceed 1 year to be certain. Because like many companies, we have routines that run on those schedules.
Each of these views has the exact same structure:
1 2 3 |
CREATE OR REPLACE VIEW old_name AS WITH qry AS MATERIALIZED (SELECT view_access_log('old_name') AS logged) SELECT vw.* FROM vw_new_name vw, qry WHERE qry.logged > 0; |
Effectively, this has the effect of materializing the single row qry
.
It only returns the columns from the new view. Any filters or sorts applied to this view are translated onto that view naturally.
But now we have a function that is called. Let's see what we can do.
The first iteration of this function was simply:
1 2 3 4 5 6 7 8 9 |
CREATE SEQUENCE seq_view_consumer; CREATE OR REPLACE FUNCTION view_access_log(text) RETURNS int LANGUAGE PLPGSQL AS $ BEGIN RETURN nextval('seq_view_consumer'); END $; |
That sequence was important to validating that regardless of the number of rows, we only logged one item. Disaster would ensue if we had a view with a million rows and we tried logging every row!
We create a table and log what happens. Here is the table we need in order to log what happens:
1 2 3 4 5 6 7 8 |
CREATE TABLE view_access_log ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, dtwhen timestamp WITH TIME ZONE DEFAULT NOW(), view_name text, call_stack text ); |
And the function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
CREATE FUNCTION touch_view(view_str text) RETURNS integer LANGUAGE plpgsql AS $ DECLARE s_stack text; BEGIN -- GET CURRENT DIAGNOSTICS s_stack = PG_CONTEXT; /* Uncomment this to log the call stack. Too intense originally! */ INSERT INTO test.view_access_log (view_name, call_stack) VALUES(view_str, s_stack); RETURN 1; END $; |
That's it.
Well, we will generate the DDL required to make this work straight from the DB. Then we have a choice to edit this or run it directly. Since it involves naming objects, we know we will have to review the DDL and edit it to make things the best we can. See Laurenz Albe's article on querying for views here. Because that portion would be too specific for this blog.
The simplicity of this solution to repair our view access logging is what makes me appreciate doing things in PG. At the same time, it is strange to some of us that SELECT
s can have side effects in the database. This is clearly an example where one of those side effects works to your advantage!
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply