CYBERTEC PostgreSQL Logo

Entity-attribute-value (EAV) design in PostgreSQL - don't do it!

11.2021 / Category: / Tags: |
good (?) reasons to use an entity-attribute-value design
© Laurenz Albe 2022

 

Customers have often asked me what I think of “Entity-attribute-value” (EAV) design. So I thought it would be a good idea to lay down my opinion in writing.

What is entity-attribute-value design?

The idea is not to create a table for each entity in the application. Rather, you store each attribute as a separate entry in an attribute table:

The name of the model is derived from the “att...” tables, which have the three columns: “entity ID”, “attribute name” and “value”.

There are several variations of the basic theme, among them:

  • omit the objects table
  • add additional tables that define “object types”, so that each type can only have certain attributes

Why would anybody consider an entity-attribute-value design?

The principal argument I hear in support of the EAV design is flexibility. You can create new entity types without having to create a database table. Taken to the extreme, each entity can have different attributes.

I suspect that another reason for people to consider such a data model is that they are more familiar with key-value stores than with relational databases.

Performance considerations of entity-attribute-value design

In my opinion, EAV database design is the worst possible design when it comes to performance. You will never get good database performance with such a data model.

The only use cases where EAV shines are when it is used as a key-value store.

INSERT

Inserting an entity will look like this:

That means that we insert four rows into three tables and have four index modifications. Also, the three statements will require three client-server round trips. You can save on the round trips by using CTEs to turn that into a single statement, or by using the new pipeline mode of libpq. Still, it will be much more expensive than inserting a single table row.

DELETE

If you use cascading delete, you could do that in a single statement:

Still, you will end up deleting four table rows and modifying four indexes. That's much more work than deleting a single table row.

UPDATE

A single-column update could actually be faster with the EAV design, because only one small table row is modified:

But if you have to modify several columns, you will need to run several UPDATE statements. That will be slower than if you only had to modify a single (albeit bigger) table row.

SELECT

Querying the attributes of an entity requires a join:

Alternatively, you could run three separate queries, one for each attribute. No matter how you do it, it will be less efficient than a single-row SELECT from a single table.

Single-column aggregates

As an example for a query that might be faster with the EAV model, consider a query that aggregates data from a single column:

With a covering index on attint(objectid, attname) INCLUDE (attval), this could be quite a bit faster than aggregating a column from a wider table.

More complicated queries

After these examples, it is clear that writing more complicated queries will be a pain with the EAV design. Imagine a simple join:

If you think that this query is hard to read, I agree with you. In a normal relational data model, the same operation could look like this:

You can guess which query will perform better.

But we need an entity-attribute-value design for flexibility!

Relational data models are not famous for their flexibility. After all, that is the drive behind the NoSQL movement. However, there are good ways to deal with variable entities.

Creating tables on the fly

Nothing keeps you from running statements like CREATE TABLE and CREATE INDEX from your application. So if there is a limited number of entity types, and each type has a certain number of attributes, you can easily model that with a traditional relational model.

Certain problems remain:

  • A data model that grows on the fly may not end up being well-designed. But that's not different in the entity-attribute-value design.
  • If the application has to create tables, it needs permission to do so. But today, when many applications create their own database tables anyway, few people will worry about that.

Creating tables on the fly will only work well if the set of attributes for each entity is well-defined. If that is not the case, we need a different approach.

Using JSON for a flexible data model

PostgreSQL has extensive JSON support that can be used to model entities with a variable number of attributes.

For that, you model the important and frequently occurring attributes as normal table columns. Then you add an additional column of type jsonb with a GIN index on it. This column contains the “rare attributes” of the entity as key-value pairs.

When using a model like this, you should take care that attributes

  • used in joins
  • on which you need a database constraint
  • that you want to use in a WHERE condition with an operator different from “=

are modeled as regular table columns.

Conclusion

Avoid entity-attribute-value designs in your relational database. EAV causes bad performance, and there are other ways to have a flexible data model in PostgreSQL.

 

Need help with your data modeling? Find out about CYBERTEC's data modeling services

3 responses to “Entity-attribute-value (EAV) design in PostgreSQL - don't do it!”

  1. Consider this: you are creating a commerce app, each product needs various custom attributes, JSON is a solid solution but how about indexing? How about creating filtering interfaces on the fly? These need to be built in the background jobs and cached. Somethimes the project and data are small enough that it is just not worth it, in the technical term you are absolutely right but in the business terms somethings are not worth the effort

    • Probably I don't get your point, but jsonb has indexing and it works quite well. Of course, you cannot migrate "all columns" into "all json", because that would be stupid. You have the chance to get advantages of both worlds, so take it. And it allows for dynamic filtering, so I don't see the problem.

  2. I've encountered this approach (luckily) only once, and it was horrible. The real problem, according to me, behind this "introspection on steroids" is that it is not tested enough: it appears to work great if you have a bunch of "objects", but quickly drives the database (and the application) unresponsive as soon as you load some real data into it.
    I had to refactor all the application and database to get rid of this crap!

    Besides, JSON(B) offers a great way to implement a part of schema-free entities easing also the application deployment against other customers (since you don't need a way to deploy schema changes because, well, there are not!).

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram