After my morning session in Dublin some people asked me to share the codes of an operator class I created for demo reasons. This posting is supposed to cover those codes.
Table of Contents
In PostgreSQL an index is not a hardwired thing, capable of following just one strategy. Instead an index is more of a framework people can use to implement their own indexing strategies. What is the point of all that?
Let us consider two values stored in a database:
1 2 3 4 5 6 |
BEGIN; CREATE TABLE t_sva (sva text); INSERT INTO t_sva VALUES ('1118090878'); INSERT INTO t_sva VALUES ('2345010477'); |
At first glance those two numbers seem to be in their right order (alphabetical). However, in reality they are not. Those two odd numbers are in fact Austrian social security numbers. The first number is related to the 9th of August 1978 (09 = day of month, 08 = month, 78 = year). 1118 is just a sequential number issued by the social security system. Taking into account that 1977 was actually before 1978 the second number should be the first entry in a sorted list.
The fact that those numbers need a “strange” sort order create some challenge when it comes to indexing this stuff. This is exactly when PostgreSQL operator classes kick in. An operator allows a user to create his own handcrafted indexing strategy. As the name (= operator class) already implies, creating an indexing strategy is all about operators. The goal is to create our own operators and use them alongside a normal btree.
1 2 3 4 5 6 7 8 |
CREATE OR REPLACE FUNCTION normalize_si(text) RETURNS text AS $$ BEGIN RETURN substring($1, 9, 2) || substring($1, 7, 2) || substring($1, 5, 2) || substring($1, 1, 4); END; $$ LANGUAGE 'plpgsql' IMMUTABLE; |
First of all we create a simple stored procedure capable of transforming our strange string and turn it into something which can be sorted normally:
1 2 3 4 5 |
SELECT normalize_si('1118090878'); normalize_si -------------- 7808091118 (1 row) |
The clue here is that we can now use our own logic to compare values and turn this logic into operators. To do so we need a set of functions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
CREATE OR REPLACE FUNCTION si_lt(text, text) RETURNS boolean AS $$ BEGIN RETURN public.normalize_si($1) < public.normalize_si($2); END; $$ LANGUAGE 'plpgsql' IMMUTABLE; -- lower equals CREATE OR REPLACE FUNCTION si_le(text, text) RETURNS boolean AS $$ BEGIN RETURN public.normalize_si($1) <= public.normalize_si($2); END; $$ LANGUAGE 'plpgsql' IMMUTABLE; |
In PostgreSQL every operator is simply based on a stored procedure. So if you have proper procedures at hand already, it is pretty simple to turn those into operators. CREATE OPERATOR does the job:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
-- greater equal CREATE OR REPLACE FUNCTION si_ge(text, text) RETURNS boolean AS $$ BEGIN RETURN public.normalize_si($1) >= public.normalize_si($2); END; $$ LANGUAGE 'plpgsql' IMMUTABLE; -- greater CREATE OR REPLACE FUNCTION si_gt(text, text) RETURNS boolean AS $$ BEGIN RETURN public.normalize_si($1) > public.normalize_si($2); END; $$ LANGUAGE 'plpgsql' IMMUTABLE; CREATE OPERATOR <# ( PROCEDURE=public.si_lt, LEFTARG=text, RIGHTARG=text); CREATE OPERATOR <=# ( PROCEDURE=public.si_le, LEFTARG=text, RIGHTARG=text); CREATE OPERATOR >=# ( PROCEDURE=public.si_ge, LEFTARG=text, RIGHTARG=text); CREATE OPERATOR ># ( PROCEDURE=public.si_gt, LEFTARG=text, RIGHTARG=text); |
If you want to create an operator you have to tell PostgreSQL which function to call. Then we have to map the left and the right argument to our two arguments passed to the function. Of course, there are more features here which can be used to make this work even better, but for the most simplistic stuff this is enough.
The names of those operators make no difference here. However, I would recommend to use something readable.
Before an index shows us how to handle a new strategy, we have to define some support functions. Depending on the type of index we will need different sets of support functions. In our case we are planning to use a simple btree so all we need is a comparison function returning a simple integer value:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
CREATE OR REPLACE FUNCTION si_same(text, text) RETURNS int AS $$ BEGIN IF public.normalize_si($1) < public.normalize_si($2) THEN RETURN -1; ELSIF public.normalize_si($1) > public.normalize_si($2) THEN RETURN +1; ELSE RETURN 0; END IF; END; $$ LANGUAGE 'plpgsql' IMMUTABLE; |
Now we have all ingredients in place to use our own indexing strategy. All it takes now is one last command:
1 2 3 4 5 6 7 8 9 10 |
CREATE OPERATOR CLASS sva_special_ops FOR TYPE text USING btree AS OPERATOR 1 <# , OPERATOR 2 <=# , OPERATOR 3 = , OPERATOR 4 >=# , OPERATOR 5 ># , FUNCTION 1 public.si_same(text, text) ; |
We have to tell the index which operators to use when and which support functions to use. In case of btrees this is really simple.
We can already create an index using our own operator class now:
1 |
CREATE INDEX idx_special ON t_sva (sva sva_special_ops); |
The optimizer is already capable of using our index:
1 2 3 4 5 6 7 8 9 |
SET enable_seqscan TO off; -- to make sure that this work with just two rows SELECT * FROM t_sva WHERE sva = '0000112276'; QUERY PLAN ------------------------------------------------------------------------------- Index Only Scan using idx_special on t_sva (cost=0.13..8.14 rows=1 width=32) Index Cond: (sva = '0000112276'::text) (2 rows) |
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
You need to load content from reCAPTCHA to submit the form. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply