Technology Used in TrialDB
We currently have both Oracle and Microsoft SQL Server schemas for the TrialDB. The Yale TrialDB system uses Oracle. Tom Hoen, formerly of Johns Hopkins, first created the Microsoft SQL Server Schema, which is synchronized with the Oracle version. (The TrialDB Web code is database-agnostic and does not use special features of either Oracle or SQL Serve, such as stored procedures).
Our Web server relies on Microsoft Internet Information Server and ASP.NET version 2.0. (We expect to upgrade to ASP.NET 3.x as Windows Vista and some of the newer Microsoft technologies become stable.) The current version of the server code has been written in Visual Basic .NET. The legacy Active Server Pages (ASP) framework is used only for clinical data forms: we intend to convert it to ASP.NET in future. This framework uses browser-based code and data structures for fast response with computed formulas and skip-fields.
Despite the improvements in user-interface widgets in .NET 2.0, developing robust web applications can still be a significant chore. The .NET WebEAV library is designed to facilitate development throughput through high-level routines that address common Web-database-programming and interface-development chores, and reduce the programmer's workload.
The study designer interface use Microsoft Access 2003 to set up and edit metadata (the data library- CRF definitions, Study Protocol, etc.). Microsoft Office 2000 Developer version, which includes a set of COM controls that are used extensively within the client, is needed to modify the Access code. We expect to move up to Access 2007 once we ensure compatibility with the existing code. (We typically wait for the first service pack of a new version of MS-Office before upgrading.) People entering data (data entry staff or patients) only need access to the internet via Microsoft Internet Explorer.
Principles Underlying TrialDB’s Architecture
The "EAV" Design
Although TrialDB runs on a relational database engine (Oracle or MS SQL Server), it structures its data using an "entity-attribute-value" (EAV) design. In this design, data element names (such as "hemoglobin" or "white blood count") are not hardwired into the database as table column headers as in a traditional relational database. Rather, they are stored as data. In addition, metadata (data about data) describing each data element are stored in a data library, where the data item definitions can be readily created, viewed, and edited by the user.
The EAV design makes it easy to accommodate new protocols (with new data items) without the additional programming that would be required in a traditional database design. One needs only to add a description of each new data element to the data library.
A Discussion of EAV Design
In a database for storing institution-wide clinical information that is gathered from several departments and numerous protocols within each department, thousands of different clinical facts can potentially be stored for different patients on different protocols. Most mainstream relational database engines have an upper limit (typically 255) on the number of columns per table. As a result, with an orthodox database design (one column per fact) numerous tables would be needed, and more would constantly need to be added as the number of recorded parameters increased. At the same time, only a relatively small number of findings are actually applicable (and recorded) for a given patient on a given protocol.
This situation is similar to the computer-science problem of sparse array representation. The computer science solution is called "row modeling." This design is also called the Entity-Attribute-Value (EAV) design, because each row holds information on the entity (patient ID, visit, date, etc.), the attribute (the name or ID of the parameter being recorded), and the value of the parameter.
Because of its structural simplicity, the EAV design is rightfully popular in clinical databases. EAV was first introduced into clinical databases by Bill Stead and Ed Hammond in TMR (The Medical Record), built at Duke in the 1970s. Production databases using EAV components include the HELP system (Huff et al., 1991, 1994) and the Columbia-Presbyterian Medical Center's clinical repository (Friedman et al., 1990).
TrialDB extends the basic EAV model in several respects.
- Instead of a single table that stores all data converted into text form, TrialDB uses several EAV tables that each store a single data type: strings, integers, floating-point numbers, dates, and so on. This allows the creation of indexes based on data element values. This in turn allows fast searching based on values of particular attributes, as well as easier validation of input data.
- The metadata component of TrialDB (the data element library) is very extensive, and is relied upon for almost every aspect of TrialDB's operation. Setting up a study essentially involves the creation of metadata by a study designer. For example, the designer must define the parameters (questions) that are recorded and their logical grouping into data entry forms. This information is then used by TrialDB to automatically generate data entry and browsing forms for the Web and for Microsoft Access clients.
- TrialDB stores certain data, such as patient demographics, in conventional table form rather than in EAV form to facilitate the storage and retrieval of data elements that are common to all studies.
- EAV is something only database developers are concerned with. To both the end user and the analyst, the data appears as though it has been organized into regular tables. This sleight-of-hand is also accomplished through the use of the data library, which additionally records how data is to be presented to the user.
|