The EAV/CR Model of Data Representation

The Bird's Eye View

I should warn you that the name "EAV/CR", on occasion, can be somewhat of a misnomer. EAV/CR originally stood for "entity-attribute-value with classes and relationships" and in some of the databases that we have implemented, this definition still holds. However, depending on the database that you are building, some, or even most, of the classes of data are best represented as old-fashioned relational tables. EAV/CR is really defined by its metadata infrastructure and the use of an Object Dictionary table rather than the fact that the representation of data representing individual classes of objects may or may not be in EAV form.

In production systems, using EAV/CR is something like driving a car using the stick shift. It gives you more control than using automatic transmission, but is also somewhat trickier than using old-fashioned design approaches. Therefore, you use it only when you absolutely need to.

In the account below, we use the word "class" as the equivalent of "database table" and "attribute" as the equivalent of "database column". While this usage is somewhat lax, this is what you end up doing in the vast majority of cases when you are using a relational database to store object-modeled data.

EAV/CR is an approach that seeks to facilitate the rapid development of interfaces to data (and/or their dynamic creation) through the recording of "rich" metadata that not only describes every element in the database from a "database" perspective, but also from a "presentation" and "user interaction" perspective.

The word "rapid" comes with a caveat- in earlier editions of their classic "Camel" book on Perl, Larry Wall and Randall Schwartz describe laziness as an attribute of truly great programmers - they first build a framework (the hard part, which is slow going), and then use the framework to automate chores (the easy/rapid part, which allows laziness). EAV/CR is the basis for such a framework.

By "database" perspective, we include info such as data type, constraints, relationships between classes, and so forth

Presentation/user interaction metadata encompasses details such as:

It would be convenient if you could build such interfaces automatically, by specifying presentation metadata in great detail and then letting the system generate an interface for you. This is what EAV/CR is about: in addition, the metadata serves as detailed and rigorous documentation for the system and its semantics. It also makes data interchange possible with programs that expect conventional (one-attribute-per-column) table structure.

The interface that is generated may be completely dynamic, or may have some static elements - the former is usually OK for browsing: the SenseLab Web site is mostly dynamic. The latter is usually required when you are doing data entry with some elaborate data validation - generating all the validation code every time the class is accessed may be too much of an overhead, and a Web page with static elements improves response. The TrialDB Web site mostly uses pages with static elements.

In brief, in a three-tier database application, the EAV/CR schema allows the middle (business logic) tier to consult the metadata to perform many chores automatically that would otherwise have to be coded on a per-table or per-column basis.

History

Definition

EAV = Entity-Attribute-Value.   EAV/CR = EAV with Classes and Relationships

Conceptually, a table with three columns:

The table has one row for each Entity-Attribute-Value triple.

In reality, we prefer to segregate values based on data type, so as to support indexing and let the database perform type validation checks where possible. So there are separate EAV tables for strings, real and integer numbers, dates, long text and Binary large objects (BLOBS).

EAV is known under several alternative names: object-property-value by the O-O folks, frame-slot-value by the AI folks.  When implemented in databases, an EAV table is called a "generic" table, or an "open schema" table.  Resource Description Format (RDF), the basis for "Semantic Web" content, uses the same approach: RDF content consists of  "Subject-Predicate-Object" triples.

Benefits

Physical vs. Logical Schema

EAV/CR Overview

EAV/CR overlays an object-oriented framework on top of an EAV physical structure.

The EAV/CR Schema: Metadata Tables

The EAV/CR Schema: Data Tables

The most critical data table is the Objects table, also called the Object Dictionary. It contains summary information on every object in the database.  The object Class field is a foreign key into the Classes table in the Metadata schema diagram below. The Keywords table allows objects to be searched by key-phrase: its contents are typically composed by combining the text of the Name and Description fields and indexing them. (In database engines such as Microsoft SQL Server,  the Keywords table is implemented through a full-text index on these fields, which allows Google-style search.

The EAV_xxx tables below implement a strongly typed EAV approach - a value of a given data type goes into a specific table, allowing the possibility of indexing by value and more compact storage without having to coerce everything into the string data type.

Note that storing data in EAV tables makes sense only if either of two conditions are met:
1. The data for a particular class are sparse: that is, most attributes are NULL.
2. You have only a handful (e.g., a few dozen at most) objects belonging to a particular class, and so you do not gain significant efficiency by create a separate table for this class.

If you have hundreds or thousands of non-sparse objects for a given class, you should create a distinct table for this class, preferably with the same name as the Class itself. (Naturally, such tables are not illustrated in the schema below, since they are specific to your application.) The summary information for each object, however, still lies in the Objects table. This way, the Keywords table lets users locate Objects of interest irrespective of what Class they belong to.  When you use a Mixed Design approach, where some data is stored in regular tables, and other data in EAV form, the metadata in the Classes table above should have a flag that stated whether Object data for a given class is recorded in conventional columnar form or in EAV form. This way, your application code knows what to do when the user wishes to inspect the details of a particular object.

Managing Relationships

Example: Data on Nigrostriate Neurons

Neurons: Nigrostriatal
Anatomical Origin:Pars Compacta of Substantia Nigra.
Projecting To: Corpus Striatum
Neurochemical/s Released: Dopamine
Receptor/s involved: D2
Electro-physiological Function:
Inhibitory
Neurons Projected To (Efferents): Striato-Pallidal neurons, Striato-Striatal neurons
Neurons Providing Input (Afferents): Pars Reticulata of Substantia Nigra, Striato-Nigral fibers.

Representing the Nigrostriate Example in EAV/CR

A. Metadata for the Neuronal_Info Association Class

Attributes:

Attribute Name

Datatype

Primary_Neuron Class, Neuron
Soma_location Class, Anatomical_Location
Axon_Terminus_Location Class, Anatomical_Location
Neurotransmitter_released Class, Neurotransmitter, multi-instance
Receptor_Type Class, Receptor, multi-instance
Electrophysiological_Effect Integer (member of a Choice Set)
Receptor_Type Class, Anatomical_Location
Efferent_Neuron Class, Neuron, multi-instance
Afferent_Neuron Class, Neuron, multi-instance

B. EAV Data for the Nigrostriate Neuron

Entity ID Attribute Value
100 <Primary_Neuron> <Nigrostriate cell>
100 <Soma_location> <ParsCompacta, S.Nigra>
100 <Axon_Terminus_Location> <Corpus Striatum>
100 <Neurotransmitter_released> <Dopamine>
100 <Receptor_Type> <D2>
100 <Efferent_Neuron> <Striato-pallidal>
100 <Efferent_Neuron> <Striato-striatal>
100 <Afferent_Neuron> <Striato-nigral>
100 <Afferent_Neuron> <Pars Reticulata, S.Nigra>
... ... ...
100 <Electrophys_effect> <2 = Inhibition>

Relationship Details as Inverted-File Indexes

Drawbacks of EAV/CR

But by all means use the Classes, Attributes and Choice Sets/choice-set-values tables : they serve to document your schema and your domains.