One problem many developers encounter while defining and analysing data requirements is the situation where a number of different attributes can be used to describe an object, but only few attributes actually apply to each one. One option is to create a table with a column representing each attribute; this is suitable for objects with a fixed number of attributes, where all or most attributes have values for a most objects. However, in our case we would end up with records where majority of columns would be empty, because attributes may be unknown or inapplicable.
To solve the above problem you can apply the EAV (Entity, Attribute, Value) model. This pattern is also known under several alternative names including "object-attribute-value" model and "open schema". In the EAV data model only non-empty values are stored in database, where each attribute-value (or key-value) pair describes one attribute of a given entity. EAV tables are often characterized as "long and skinny"; "long" refers to multiple rows describing entity, and "skinny" to the small number of columns used. In this article we will describe the EAV model, its implementation, and show applications of using it in the real world - including examining how Magento makes use of this pattern.
Object attributes are stored in a table with three columns: entity, attribute and value. The entity represents data item being described, for instance a product or a car. The attribute represents data that describes an entity, for instance a product will have a price, weight and many more characteristics. The value is the value of that attribute, for example our product might have an attribute price of £9.99. Additionally values can be segregated based on data type, so there would be separate EAV tables for strings, integer numbers, dates and long text. Splitting the types is done to help support indexing and let the database perform type validation checks where possible.
Sparseness of Attributes
In maths and computer science, if an object only has a few attributes from a potentially large number, we call that a "sparse matrix". When we talk about the EAV model, we use the term "sparse" to describe attributes where most have no value.
To illustrate this, lets look at a receipt from a supermarket. The supermarket has thousands of products in stock, with new ones being introduced on daily basis and others withdrawn from sale. When customer buys 5 products the receipt only lists details of items actually purchased, one product per row. The receipt does not list every product in the store that customer could have purchased, so we say the customer's receipt is sparse.
In database terms the entity is the sales receipt, with information such as transaction id, date and time, store location, etc. Each detailed line in the receipt corresponds to a record in sale line table and stores an attribute and one or more values. In this scenario an attribute is a product purchased by the customer. The values are quantity, unit price, discount and total price.
The above example illustrates the sparseness of attributes (a customer buys only a selection of the available products) and introduces us to a new term: row modelling. The table is row modelled where the series of facts describing an entity are recorded as multiple rows. Every new set of facts is stored in database as additional rows rather than addtional columns. Row modelling is a standard data modelling technique in designing databases. It should only be used where following two conditions are met:
- the data for a particular entity is sparse
- the data is vulnerable to change
Row modelling is inappropriate where sparseness and volatility don't exist, in such case traditional column modelling should be used.
Example of Row Modelling
In this example we have three entities: Product, Customer and Invoice. Both Products and Customers are standard relational tables.
As mentioned earlier in this article, there are two conditions that need to be met for row modelling to be a good choice; the data for a particular entity is sparse, and it is vulnerable to change
We know that products are constantly fluctuating with new being introduced and old withdrawn. At the same time, the invoice table cannot have a column for each product as this would be impractical.
The invoice table contains main information about the event of sale; the customer, the date and the time, and the invoice id. Each invoice is then described by rows recorded in the invoice_lines table. Each row specifies which product was purchased, the price per unit, and the quantity.
EAV vs Row Modelling
Entity-Attribute-Value design is a generalization of row modelling. It means that all types of facts are recorded in a single table across the entire database, where a row-modelled table is homogeneous in the facts that it describes. Moreover, the data type of a value column in a row-modelled table is pre-determined, whereas in an EAV table, a value's data type depends on a attribute recorded in a particular row.
It can be difficult to choose the best approach for modelling data, but as a guideline you would consider EAV modelling over row modelling when:
- Individual attributes are different in data types recorded (Yes/No, numerical values, strings). It makes storing attribute values in one table difficult when row modelling is used.
- Numerous categories of data need to be represented, and their number can fluctuate. At the same time categories have very small number of instances, even if the attributes are not sparse. Use of conventional modelling in such case would mean hundreds of tables with very few rows.
- In certain types of environments, where categories/classes need to be created on the fly and some classes are often eliminated in subsequent cycles of prototyping.
- Where there are categories classed as hybrid, meaning that some class attributes are non-spares and other attributes are highly sparse. In this situation the non-sparse attributes are stored in conventional table, while the sparse attributes are stored in EAV or row-modelled format. The classes are commonly met in business database application, where descriptions of products depends on product category, but all products will share attributes such as packaging unit and cost per item. Note that if there are only one or two hybrid classes, the EAV design may not be worthwhile.
Representing Entities, Attributes and Values
Representing Entities
An entity can be any item, so far we have seen examples where an entity was an event of sale, a merchant and a product. Entities in EAV are managed via an Objects table that captures common facts about each item, such as name, description, and so on. The Objects table must have unique identifier for each entity, and this is usually automatically generated. The identifier is then used across the database as a foreign key.
Using EAV modelling does not stop us from using conventional tables to capture additional details for individual objects. It is common to employ traditional relational database modelling and EAV modelling approaches within the same database schema.
Representing Attributes
Attributes are stored in a dedicated attributes table. The primary key of this table is used as a reference across the database. The attributes table is usually supplemented with multiple metadata tables that describe an attribute in further detail. This metadata information is often used to automate generation of user interfaces for browsing and editing data. The metadata tables would be likely to contain some of the following types of information:
- Validation: Validation metadata includes data type of an attribute, default value, limits a number of possible values and whether the value can be null.
- Presentation: Defines how attribute is presented to a user, whether it is text area, drop down or set of radio buttons / check boxes.
- Grouping: Attributes are mostly presented in groups to a user. Grouping metadata defines an order in which attribute is displayed, number of attributes presented and what type of font or color is used.
- >Range of normal values: In some situations ranges of normal values are also stored. They may vary based of sex, age, etc.
Representing Values
The simplest solution to represent data in an EAV model is to store it as a string. However, this approach is relatively inefficient since it requires data type conversions when doing anything with values. Additionally, indexes on values stored as strings do not allow optimized range searches for numeric and date types; this is a common problem when working with key-value pairs data of mixed data types.
To improve on this situation, the EAV model uses separate tables for each data type. Attribute metadata identifies the correct data type and subsequently the EAV table in which data is stored. This approach is much more efficient as it allows metadata caching for a given set of attributes in a form prior to accessing the data.
One major drawback of this solution is seen when the data type of an attribute needs changing. Thisrequires reassigning data from one table to another, which is inconvenient, and can be done using stored procedures.
In general, values that are empty or do not apply to this entitty are not stored in the EAV model. In some circumstances however there is a need to record reasons for missing values. In such cases the solution is to add a missing value code column to a table, which is non-null only when the value column is null. This code is then used to look up a list of textual explanations.
Example
The simplest implementation of EAV may have just three tables: entity, attribute and value. An example of this setup is shown here:
However, in such implementation we lose metadata information and all the values are stored as varchar, regardless of their data type. As a variation on this approach, we can alternatively implement a strongly-typed approach where a value of given data type is recorded in a table specific to that data type. An example schema is shown below and includes the metadata storage that we covered earlier.
Magento Uses The EAV Model
In open source and php community one of the most well-known implementations of the EAV model is in Magento, the e-commerce platform. Let us begin by taking a look at the Magento database schema. Although this may appear complicated at first, we will look at it in stages.
As mentioned before an entity can be any item or event. In Magento there are multiple entities, for instance: customer, order, invoice and product. For the purpose of this article I will use product entity to explain EAV implementation. The main table for products is catalog_product_entity. However, you may be surprised that it only holds a few pieces of information, such as entity type, SKU and when product was created.
In order to build an entire product record we need to find its attributes and then find a value for each of the attributes. In catalog_product_entity you will find the entity_type_id column. This is used across the entire database as the entity type identifier. Based on the entity type we can find which attributes are assigned to a product by looking in eav_attribute. This table records all attributes for all entities within Magento. It also has metadata information for each record, such as data type, front end details, etc. For products, entity type id is set to 4 (types are listed in the eav_entity_type table). Toselect all the attributes that are assigned to products, we simply do:
SELECT * FROM eav_attribute WHERE entity_type_id = 4; |
The name of an attribute is recorded as attribute_code; from the metadata information, an important column is backend_type. This indicates what data type an attribute is and where the values for the attribute are stored. Magento allows following data types:
- static
- datetime
- decimal
- int
- text
- varchar
As mentioned before values can be stored in multiple tables based on their types. To examine a particular attribute, we could use a query like this:
SELECT * FROM eav_attribute WHERE entity_type_id = 4 AND attribute_code = 'name'; |
After running above query we can see that attribute 'name' has a data type of varchar. The product attributes values are stored across a number of tables: catalog_product_entity_datetime, catalog_product_entity_decimal, catalog_product_entity_int, catalog_product_entity_text, catalog_product_entity_varchar. These table names illustrate the way different data types are stored in the EAV model.
To get a list of all products and their names we can use a query like this:
SELECT cpe.entity_id, value AS name FROM catalog_product_entity cpe INNER JOIN eav_attribute ea ON cpe.entity_type_id = ea.entity_type_id INNER JOIN catalog_product_entity_varchar cpev ON ea.attribute_id = cpev.attribute_id AND cpe.entity_id = cpev.entity_id WHERE ea.attribute_code = 'name' |
The concept is simple once you know where to start and how to find the next table in hierarchy. All the other entities follow the same principle: you find entity type id for an object you are interested in, then you get all attributes from eav_attribute based on it and finally select values for each attribute from different tables based on attribute type.
Advantages and Disadvantages of the EAV Model
The main advantage of using EAV is its flexibility. Table holding attributes describing an entity is not limited to a specific number of columns, meaning that it doesn't require a schema redesign every time new attribute needs to be introduced. The number of attributes can grow vertically (new table record for every new parameter) as the database evolves, without the need for structure changes.
The fact that EAV handles only non-empty attributes means that there's no need to reserve additional storage space for attributes whose values would be null. This makes the EAV model quite space efficient.
Physical data format is very clean and is similar to XML, allowing data to be easily mapped to XML format; it only requires replacing the attribute name with start-attribute and end-attribute tags.
EAV model is excellent for rapidly evolving applications because it protects us against consequences of constant change. We can simply record new data of any structure without the need to modify the database schema.
When considering EAV it is important to identify whether the data is sparse and numerous since the complexity of EAV design exceeds its advantages when used for inappropriate data set. It would be more appropriate to opt for conventional tables for schemas that are relatively static and/or simple.
A major downside of EAV is its lower efficiency when retrieving data in bulk in comparison to conventional structure. In EAV model the entity data is more fragmented and so selecting an entire entity record requires multiple table joins. What is more, when working with large volumes of EAV data, it can be necessary to transiently or permanently convert between columnar and row/EAV -modelled representation of the same set of data. The operation can be error-prone and CPU intensive.
Another limitation of EAV is the fact that we need additional logic in place to complete tasks that would be done automatically by conventional schemas. However, existing EAV tools can be implemented to reduce the overheads of this.
Finally, understanding the EAV model does require time. It has a definite learning curve, so junior developers may struggle working with this model, before they can truly understand the concept.
Conclusion
The Entity Attribute Value should be considered if the following conditions are met:
- Data are sparse and heterogeneous, with wide range of attributes that can apply to an entity and new attributes being often introduced.
- The number of classes is very large, with classes having numerous instances, even if the attributes are not sparse.
- There are numerous hybrid classes, possessing both sparse and non-sparse attributes. Typically, not all classes in the data will meet the requirements for EAV modelling.
In production, schemas tend to be mixed, including conventional relational, EAV or hybrid approaches as appropriate. The EAV modelling, however, requires introduction of metadata to capture the logical model data for EAV. We saw that in Magento, this is used to great effect where the various different products will have very different attributes set; a nice illustration of where this model can be applied very effectively. This post has hopefully shown you what the EAV model is and how, and more importantly when, to use it. If you have any more examples of it in use, or any questions, then add a comment!

22 comments















22 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.
Really interesting subject.
Had implemented this kind of approach on a past project, one of the highest issues that I've been through was performance "EAV" approach will prove as a less efficient approach in terms of performance, but way far more flexible when implementing abstraction models.
Thank you for the article.
EAV was one of the biggest "mistakes" I ever made with a version of a CMS we wrote. That is not to say it did not work - it did and worked fairly well, except working with the data was very hard. As time went on more data made the queries slow (ok you can solve this in various ways - caching etc.. but you get to this issue way faster with EAV), and complex queries took hours to forumulate rather than minutes.
I fully admit it probably has uses for some classes of problem - those you describe PLUS that your users themselves need to be be control of the classification of data but are not technical people. Such as Magento - they have a product they want anyone to be able to configure themselves.
But if you are close to the data and the system, row-modeling is still the best way in my view :)
But interesting subject and a good article.
Excellent intro. I've often felt, especially when wrestling with Magento, that the EAV approach is a hack for when you can't guarantee the availability of a schemaless database system.
Ecommerce is one of the prime candidates for a no sql makeover.
Nice article, builds upon my basic EAV understanding very nicely, thank you.
I had not considered the auto-generation of form elements, say, nor of filtering data based on the types - good food for thought there.
Taking the idea a bit further into the Model are there any recognised design patterns that spring to mind which would ease the crud process?
I'm wondering how EAV plays against ORM-like solutions.
Should'a used Mongo DB, because it's web-scale, lol.
Humor aside, besides being a PITA, EAV is costly eough in development time, that a document based (schema-free) database (e.g. Couch DB) should be seriously considered as an alternative during a project's planning phase.
EAV also totally screws the relational model. Of course with document stores there is no a clean way around having to go EAV. However even with an RDBMS in many cases you can just as well serialize those values and store them as a single text field. For fulltext searching you are better of using something Lucene based anyway.
Your use cases for EAV don't hold true for Magento. You might have many products in a Magento catalog, but once you define the products, the data would neither be sparse nor frequently changing.
What is true for an e-commerce platform is that each product type has a dissimilar set of attributes. This is another potential use case for EAV, to support an extensible, horizontal product. But it comes at a high cost in complexity of data design and code design.
For data design, just look at the typical questions about Magento on Q&A sites like StackOverflow. "I want to write an SQL query for a report, but I can't understand how to write a simple query against the confusing Magento schema."
For code design, keep in mind that you have to write application code to reinvent wheels that would be solved simply in SQL with data types, NOT NULL, or foreign key constraints.
EAV is an example of the "Inner-Platform Effect" antipattern, where we reimplement a solution on top of the solution that already does what we want. We do this in the name of adding flexibility, but we do so by sacrificing other needful features of the original solution.
@PaulG
Basically Magento builds and ORM solution above EAV models if you can say so. This is a complicated solution, but it worth it.
You work with a Product (which is EAV structured in database) as with an regular object with it's properties and all EAV routines are hidden.
"The fact that EAV handles only non-empty attributes means that there’s no need to reserve additional storage space for attributes whose values would be null. This makes the EAV model quite space efficient."
Citation Needed. Nulls are very cheap for both InnoDB (5.0+) and MyISAM (1 bit/field).
InnoDB has at least 13 bytes of overhead per row for a transaction id + rollback pointer - and so EAV even when using smallints for denoting the attribute id takes more space.
As Lukas said - it's often better just to serialize the data and stick it in a text field. Some people are starting to call this E-Blob.
Thanks guys for all the comments.
Bill this article is not about promoting EAV. Magento uses variation of EAV and everyone who wants to work with Magento needs to understand how it works.
EAV is a solution that has its benefits and drawbacks. And developers need to take decision what's good for them: flexibility or performance.
Yeah attribute value table pattern is a useful one. I have worked with it in a few companies already and it espacially helps if you need to allow users to add fields to your objects and you dont even know what types are they going to need.
I am not sure how well would it work if you base entire system on schemaless model though. So far did not have opportunity to try it out.
cheers
@Morgan: Well I think the bigger issue than NULL or NOT NULL is that EAV entirely depends on variable length fields and lots of self JOIN's on them. Thats definitely a sure fire way to prevent good performance. Furthermore you prevent use of constraints etc. Meaning if at all you are using the RDBMS for its transactional capabilities. Furthermore as I stated above there is no point in "faking" a relational schema here, just serialize and admit that you are raping your RDBMS.
A 3rd example of an EAV schema beside the two shown in the picture: the way the eZ Publish CMS does it. Use a single table for attributes, with an int column, a float one and a text one. Assuming you store dates as timestamps, you cover all of the basic attribute types. Of course the table is bigger, but you will in real life have most of the rows containing text (and more than 4 bytes) anyway, so the increase in size is not so important.
@gggeek that's interesting idea. I never worked with eZ Publish CMS, I must have a look at it.
Great article! I am developing a programming platform for databases with EAV model that accommodates metadata of tables and can store programming logic. It offers ability to add any number of application tables and functions without altering database schema. With EAV the emphasis seems to be mostly on storing tables of data whereas my platform can also store programming logic.
Continuing the Discussion
Yet another article promoting the antipattern of EAV. Oh, the pain. http://tinyurl.com/29kkrya
The EAV Data Model by Robert Raszczynski http://techportal.ibuildings.com/2010/10/21/the-eav-data-model/
RT @billkarwin: Yet another article promoting the antipattern of EAV. Oh, the pain. http://tinyurl.com/29kkrya
RT @billkarwin: Yet another article promoting the antipattern of EAV. Oh, the pain. http://tinyurl.com/29kkrya
RT @billkarwin: Yet another article promoting the antipattern of EAV. Oh, the pain. http://tinyurl.com/29kkrya