CYBERTEC PostgreSQL Logo

Understanding recursive queries in PostgreSQL

08.2020 / Category: / Tags: | |
recursive queries misunderstood
© Laurenz Albe 2020

 

Many people consider recursive queries a difficult topic. Still, they enable you to do things that would otherwise be impossible in SQL.

This articles gives a simple introduction with examples and shows the differences to Oracle's implementation of recursive queries.

Common table expressions (WITH clauses)

A common table expression (CTE) can be seen as a view that is only valid for a single query:

This could also be written as a subquery in FROM, but there are some advantages to using CTEs:

  • The query becomes more readable.
  • You can reference the CTE several times in the query, and it will be calculated only once.
  • You can use data modifying statements in the CTE (typically with a RETURNING clause).

Note that before v12, PostgreSQL always materialized CTEs. That means, the CTE was calculated independently from the containing query. From v12 on, CTEs can be “inlined” into the query, which provides further optimization potential.

CTEs (as well as the recursive CTEs mentioned below) are defined in the SQL standard, although PostgreSQL does not implement the SEARCH and CYCLE clause.

Syntax of recursive queries

Recursive queries are written using recursive CTEs, that are CTEs containing the RECURSIVE keyword:

How recursive queries are processed

PostgreSQL internally uses a working table to process recursive CTEs. This processing is not really recursive, but rather iterative:

First, the working table is initialized by executing the non-recursive branch of the CTE. The result of the CTE is also initialized with this result set. If the recursive CTE uses UNION rather than UNION ALL, duplicate rows are removed.

Then, PostgreSQL repeats the following until the working table is empty:

  • Evaluate the recursive branch of the CTE, replacing the reference to the CTE with the working table.
  • Add all resulting rows to the CTE result. If UNION is used to combine the branches, discard duplicate rows.
  • Replace the working table with all new rows from the previous step (excluding any removed duplicates).

Note that the self-referential branch of the CTE is not executed using the complete CTE result so far, but only the rows that are new since the previous iteration (the working table).

You have to be aware of the danger of an endless loop here:

If the iteration never ends, the query will just keep running until the result table becomes big enough to cause an error. There are two ways to deal with that:

  • Often you can avoid infinite recursion by using UNION, which removes duplicate result rows (but of course requires extra processing effort).
  • Another way is to place a LIMIT clause on the query that uses the CTE, because PostgreSQL stops processing if the recursive CTE has calculated as many rows as are fetched by the parent query. Note that this technique is not portable to other standard compliant databases.

A simple example

Let's assume a self-referential table like this:

We want to find all the subordinates of person 7566, including the person itself.

The non-recursive branch of the query will be:

The recursive branch will find all subordinates of all entries in the working table:

We can assume that the dependencies contain no cycles (nobody is his or her own manager, directly or indirectly). So we can combine the queries with UNION ALL, because no duplicates can occur.

So our complete query will be:

Adding generated columns

Sometimes you want to add more information, like the hierarchical level. You can do that by adding the starting level as a constant in the non-recursive branch. In the recursive branch you simply add 1 to the level:

If you use UNION to avoid duplicated rows in the case of circular references, you cannot use this technique. This is because adding level will make rows that were identical before different. But in that case a hierarchical level wouldn't make much sense anyway because an entry could appear on infinitely many levels.

Another frequent requirement is to collect all ancestors in a “path”:

Comparison to Oracle

Oracle database has a different syntax for recursive queries that does not conform to the SQL standard. The original example would look like this:

This syntax is more concise, but less powerful than recursive CTEs. For more complicated queries involving joins, it can become difficult and confusing.

It is always easy to translate an Oracle “hierarchical query” to a recursive CTE:

  • The non-recursive branch is the Oracle query without the CONNECT BY clause but including the START WITH clause.
  • The recursive branch is the Oracle query without the START WITH clause but including the CONNECT BY clause. You add a join with the name of the recursive CTE and replace all PRIOR columns with columns from that joined CTE.
  • If the Oracle query uses CONNECT BY NOCYCLE, use UNION, otherwise UNION ALL.

Apart from that, Oracle also supports standard compliant recursive CTEs. These also support the SEARCH and CYCLE clauses that PostgreSQL doesn't implement.

The true power of recursive queries

Without recursive CTEs, many things that can be written in procedural languages cannot be written in SQL. That is usually no problem, because SQL is made to query databases. If you need procedural code, you can always write a database function in one of the many procedural languages available in PostgreSQL.

But recursive CTEs make SQL turing complete, that is, it can perform the same calculations as any other programming language. Obviously, such a “program” would often be quite complicated and inefficient, and this is a theoretical consideration and not of much practical value. Still, the previous examples showed that recursive CTEs can do useful work that you couldn't otherwise perform in SQL.

As an example for the power of recursive queries, here is a recursive CTE that computes the first elements of the Fibonacci sequence:

This example also demonstrates how an endless loop can be avoided with a LIMIT clause on the parent query.

Conclusion

You shouldn't be afraid of recursive queries, even if the term “recursive” scares you. They are not that hard to understand, and they provide the simplest way to query databases containing hierarchies and graphs. In many cases they are also much more efficient than equivalent procedural code because they avoid repeated queries to the same tables.

Together with window functions, recursive queries are among the most powerful tools that SQL has to offer.

2 responses to “Understanding recursive queries in PostgreSQL”

  1. Probably more useful than deriving a "level" column would be a varchar breadcrumb of the IDs for each row that can be used as a sort key. Concatenating the IDs (unique attribute) with white space or underscores allows you to order a hierarchy in your results and see the tree expand out in your tabular display. Another useful convention is to derive yet another column that shows the origin or root parent for each row. From all that you can determine the root ID of any row that you are analyzing and issue a simple query (or create a table valued function) that returns the hierarchy for a specific parent (customer, employee, inventory, etc). This is all advantageous because in the real world you often have many hierarchies existing in one table/tuple and not the trivial manager/employee example that expands into a single clean tree.

  2. Another great feature with Postgres is Array column types.

    In the case of recursive queries you can use it to prevent recursion re-entry. If somebody made a mistake and two people report to each in the tree, you can ensure that it never loops forever.

    To do this, add another calculated column that is an Array. So in the first query you could do:
    SELECT empno, ename,
    ename AS path,
    ARRAY[empno] as visited
    FROM emp
    WHERE empno = 7566

    Then in the recursive 2nd clause:
    UNION ALL
    SELECT emp.empno, emp.ename,
    ctename.path || ' -> ' || emp.ename,
    ctename.visited || emp.empno
    FROM emp
    JOIN ctename ON emp.mgr = ctename.empno
    WHERE
    NOT ctename.visited && ARRAY[emp.empno]

    This will add an array that keeps appending each visited location. If you have previously visited the location, it will prevent recursion.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram