Using a few tables to hold all Domain values: Good Idea or Bad?

A "domain" is a set of items that define the permissible values ("discrete values") that a database column may take. For example, Gender may take the values Male, Female or Unknown. In many applications, each item is defined by a pair of values: an internal code, typically a number, and an accompanying descriptive phrase. For example, many items in questionnaires are scored on a 1-5 Likert Scale, such as:
 1= Disagree strongly, 2 = Disagree Somewhat, 3 = Neutral,. 4= Agree somewhat, 5=Agree strongly.

In user interfaces, database columns based on domains are typically presented as combo boxes, list boxes or radio button lists, depending on the application. Combo boxes take the least space and are convenient for keyboard-oriented users; radio-buttons are fastest to enter with the mouse, but take up more screen real estate.

In a database where a large number of columns are based on domains, one typically chooses to store the domain values in a table so as to simplify the presentation and validation logic. There are two ways to do this. In one, the database column becomes a foreign key into a table of domain values, so we have one table for each domain. In another, we seek to combine the separate domain tables into fewer tables in some way.

Perspective 1: Combining Domain Values into Fewer Tables is Bad

Louis Davidson's website advocates against using the second approach. The example Davidson uses is a table GenericDomain consisting of the columns (GenericDomainID int primary key, RelatedToTable text, RelatedToColumn text, Code int, Description text). Davidson then goes on to show why this design is unworkable. 

While I agree strongly with the other nine issues Davidson emphasizes (and I believe that he performs a very valuable service to the developer community through his site) I would disagree with this one.

First, Davidson's GenericDomain table is not in third-normal form . (To recap, in third-normal form, all columns of a table depend on the key, the whole key and nothing but the key.) A domain is a Set of values, not an individual Code-Description pair. The correct design would be to use three tables:

Davidson's query: "Give me all the domain values for the Customers table" is now answered very easily using the following query, which uses a two-table join:
 select M.RelatedToColumn, DV.Code, DV.Description
    from Domain_Values DV, Domain_Mapping M
    where M.Domain_ID= DV.Domain_ID and M.RelatedToTable='Customers'
    order by M.RelatedToColumn, DV.Code

Second, Davidson raises the issue that the GenericDomain design makes it messy to validate data using foreign key constraints. This argument is not valid for today's mainstream RDBMS technology. Implementing a foreign-key constraint involves creating a Stored Function and using it in a constraint. This is something that both Oracle and MS SQL Server are capable of. Of course, you would not care to create a new stored function for each new domain. The code below shows that you do not need to do so: a single general-purpose function is enough.

The stored function illustrated below, CheckDomainCol, (I have used SQL Server syntax: the code is pasted from SQL Server Management Studio) is general, parameterized, and needs to be created only once: it can be used multiple times, once for each column whose values you need to constrain to a specific domain.

The first part of the code, and the code that proves that the constraint works, is just there to save you the trouble of typing: the critical part of the code - the stored function, and its use in a constraint - is highlighted in bold.

CREATE TABLE Generic_Domains(
 Domain_ID int IDENTITY(1,1) NOT NULL primary key,
 NAME varchar(50) NOT NULL, DESCRIPTION varchar(255) NULL

CREATE TABLE Domain_Values(
 Domain_ID int NOT NULL,
 Code int not null,
 DESCRIPTION varchar(200) NULL,
 DATETIME_LAST_MODIFIED datetime NULL default getdate(),

Create table Domain_Mapping (
   Domain_ID int not null,
   RelatedToTable varchar(255) not null,
   RelatedToColumn varchar(255) not null,
   constraint Prim_Dom_Map Primary Key Clustered (Domain_ID, RelatedToTable, RelatedToColumn)


insert into Generic_Domains (name, description) values('test', 'a test domain')
/* the above record gets the primary key value 1 *//span>
 insert into Domain_Values(Domain_ID,Code,Description) values (1,1,'Yes');
 insert into Domain_Values(Domain_ID,Code,Description) values (1,0,'No');
 insert into Domain_Values(Domain_ID,Code,Description) values (1,99,'Unknown');

create table CheckTbl (col1 varchar(255), col2 int)
insert into Domain_Mapping values(1, 'CheckTbl', 'col2')

CREATE FUNCTION CheckDomainCol(@Val int,@Tablename varchar(255), @ColumnName varchar(255) ) RETURNS bit AS
 DECLARE @retval int, @domain_ID int
Select @Domain_ID = Domain_ID from Domain_Mappings where RelatedToTable= @Tablename and RelatedToColumn = @ColumnName
SELECT @retval = COUNT(*) FROM Domain_Values DV where Code = @Val and Domain_ID= @Domain_ID
RETURN @retval

ALTER TABLE CheckTbl ADD CONSTRAINT chkCol2 CHECK (dbo.CheckDomainCol (col2, 'CheckTbl', 'col2') = 1);


insert into CheckTbl values(1, 45) /* this will fail */ 

Note that the Domain_ID itself is not a parameter to the function CheckDomainCol: this value is determined by looking up the Domain_Mappings table which has been populated with the appropriate data. Therefore all that the function needs to constrain a given enumerated column is the column name without quotes, the table name to which it belongs, and the column name in quotes. Provided that the Domain_Mappings table has been populated correctly, it is hard to make a mistake in defining the constraint.

In a developer interface, one would have to make the selection of the Domain_ID in the Domain_Mappings table user-friendly by letting the developer browse the other two metadata tables.

I fully agree with Davidson that the GenericDomain design approach does not make sense if you need multiple columns, not just a Code and Description, associated with each domain value.  In this case, you should create a separate table for that particular domain.

I conclude that the arguments against Generic Domain tables are not valid any more. (Older versions of MySQL, pre 5.0, do not support stored functions, but if you are a MySQL developer, I don't see any reason why you should not be using v 5.x or later.)

Perspective 2: Consolidating domain tables is good

The above text should constitute reasonable proof that Domain-Value tables aren't bad. But to justify using them, there must be some tangible benefits, not just the absence of negatives. I state these below: the advantages are significant, and become more so as the schema gets increasingly complex. (The schemas used in Enterprise Resource Planning (ERP) databases and institutional electronic medical record systems, for example, have several hundred domains.)

All three issues above emphasize the importance of looking at the big picture. The database schema is not only the Database Administrator's concern. The database exists to meet a business need, and DBA has to partner with applications developers and power users/content experts, and ensure that appropriate schema design facilitates productive software development and metadata-content maintenance.