CYBERTEC PostgreSQL Logo

pg_read_all_data - a system level "read all data" role for PostgreSQL!

04.2021 / Category: / Tags: |

By Kaarel Moppel

I don’t usually post about upcoming PostgreSQL features and rather concentrate on tools available / versions released... but this feature, pg_read_all_data, got me excited and will certainly be a huge relief for real-life usage, especially for beginners! I had almost lost hope we might see this day - but after many years of wishing for it, one of the features most requested by normal business users/application developers has landed in the repository. Kudos to the author and reviewer!

Full Git commit entry here.

Why is pg_read_all_data a “killer feature”?

In a perfect world, no one would require such convenience functionality - given, of course, that everyone knew more or less exactly how their data model was going to look like, could predict how it was going to be used, and more importantly, who was going to use it, and designed the access/privilege system accordingly. Sadly, we sometimes have to live and deal with the real world where we mostly don’t have the time and resources to work out perfect solutions — so there’s a healthy number of question marks hovering around every decent-sized project, to say the least.

Then again, it’s true that for years, we have managed to get by without this new feature. But often what we see happening is that database operators cannot foresee and nicely accommodate all future data access requirements... and in many cases, they fix the situation with the hefty price tag of handing out potentially dangerous superusers access!

If you’re new(ish) to Postgres you might think here - hmm, how bad can it be? All our data analysts have been superusers for years, and nothing bad happened? Sure, I get it, most of the time nothing bad happens... but nevertheless, you will have a constant “cloud of danger” hovering around your database with too generous superuser usage. Even if your data is nicely backed up or just plain not too important, a mean superuser can take over the whole database server by executing random commands on the OS level! And this my friends is a feature, not a bug. From there, the next step could be infiltrating the whole company-wide network and stealing all valuable assets, or locking you out of your business, to risk drawing an even darker picture…

To read more on those scary superuser side-effects I suggest reading this article.

How does pg_read_all_data work?

Well, it couldn't be much simpler - just hand out the required GRANT (pg_read_all_data or pg_write_all_data) to a trustworthy user of choice and voila! 🙂

NB! You yourself still need to be a superuser. Note that I’m only demonstrating the “read all” use-case here, as that will be the more common one.

We’re not there yet - workaround ideas until pg_read_all_data is finalized

The change was committed and is upcoming... but sadly it will still take many months before the next major version (v14) is released - so how can you implement the “read all data” / “write all data” requirement with the means currently available? There are a couple of options:

  • Build up a proper role hierarchy, and set up appropriate “default privileges” using a relatively unknown Postgres ALTER DEFAULT PRIVILEGES syntax. This is the cleanest and most highly recommended method. Basically, you could have an appdb_reader meta-role that gets a SELECT automatically granted for every table that will be created in the future, and then you could just assign this role to a trustworthy/needed real (login) user. So something along the lines of:
      • If the role hierarchy concept seems too complex to implement and the schema is static enough, we can also opt for a quick fix. Note, however, that new tables will still need a separate grant.
      • Something for the courageous: superuser access on replicas only! Here I mean that you really create a spare replication machine and block access on the real HA nodes on the pg_hba.conf level. In that way, a malicious (or hacked) user won’t pose an OPSEC threat. Note that for heavy read queries you might also need to tune some configuration parameters still, to avoid replication conflicts - see here for details in case needed.

      Further reading

      If you would like to learn more about PostgreSQL’s role and privilege system, I’d suggest looking at some articles here and here, and at the official documentation here.

      Summary

      There’s not much to conclude here about this new feature, since it’s dead simple to use (given you don’t have to worry too much about some secret tables), so I will leave you with just a small remark: with this newly added feature, you can end up with a non-optimal database design. For that reason, make sure to understand the implications of its use, and please do still try to use the good old role system for more complex projects. Having more granular access at your disposal in an enterprise context can mean the difference between handing out a simple GRANT vs expensive redesign of the whole database schema!

      pg_read_all_data: The new built-in PostgreSQL roles next to their older colleagues
      The new built-in PostgreSQL roles next to their older colleagues.

       

      I hope this feature will result in far fewer unnecessary superuser roles being handed out. All in all, it’s yet another great reason to become an “elephant herder”, if you aren’t doing that already. 🙂

      Leave a Reply

      Your email address will not be published. Required fields are marked *

      CYBERTEC Logo white
      Get the newest PostgreSQL Info & Tools


        This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

        ©
        2024
        CYBERTEC PostgreSQL International GmbH
        phone-handsetmagnifiercrosscross-circle
        linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram