Features, Capabilities, and Requirements
On this page:
Web Web Entry System Requirements
Features and Capabilities:

Web Entry System Requirements
- Windows XP
- 512MB RAM (Large Web forms will upload much faster with more RAM.)
- Microsoft Internet Explorer version 7 or higher, with security updates and Internet security settings configured for TrialDB (check the Log In Web page for details).
- Acrobat Reader 7.0 or later.
- The TrialDB Pedigree feature requires Adobe SVG (Scalable Vector Graphics) viewer..
Features and Capabilities
TrialDB has a number of features to allow the robust support of clinical trials, including the following.
Data Element Library
A central component of TrialDB’s design is its data element library, which stores information (metadata) defining each data item and how those data items are grouped together into data entry forms (screens). This approach is fundamental to TrialDB’s ability to operate in many different clinical domains without the need for additional programming.
Automatic Creation of Data Entry Forms
The data entry forms (the electronic version of the CRFs) used to collect data for each patient are created automatically based on information in the data library. As a result, these forms can be readily modified and/or reused in different studies. TrialDB can generate Web forms that allow data entry from anywhere on an institution’s local network and from anywhere on the Internet, using 128-bit SSL security. All that is required to use these forms is a networked PC on which Internet Explorer is installed. The forms are quite robust, including features such as:
- Generation of pull-down menus (or combo boxes) for questions whose permissible values are based on a set of discrete values.
- Automatic re/computation of the values of certain fields through formulas based on other fields, as well as 'skip logic', i.e., conditional disabling of fields based on values entered in previous fields.
- Validation based on data type, range checks, field length, regular expressions, and arbitrary rules based on the values of other fields in the form. Complex validation is also performed: you can specify conditions which must be true before the form can be saved, and an error message to be displayed if a particular condition is not true. The condition expressions can be arbitrarily complex, accessing two or more fields in a form as well as data previously entered in other form instances (in the same or other forms), or patient-demographics/study parameters.
- Simulation of “sub-forms” for data with one-to-many relationships. An example is adverse events whose occurrences are highly variable across subjects.
- “Conditional pick lists,” whose choices change based on values chosen in other fields.
- Access to controlled vocabularies such as ICD-10[21] and DSM-IV[22] through keyword search. (see later) The developer only needs to specify which vocabulary is to be accessed.
- Uploading of Binary Large Object (BLOB) data such as images (see later), and display of the same by a program that is registered on the user’s machine as the default for a particular type of data, such as a JPEG image.
- In many studies, different investigators wish to use different subsets of questions in the same CRF. In this case, instead of creating new CRFs, the designer can specify, for a given study, which questions within the “default” (super-set) CRF are to be used. The code that is generated dynamically suppresses display of the omitted fields, to create the illusion that a custom CRF has been built for this study.
- Ability to associate individual out-of-range or suspicious values of clinical parameters in a CRF with a “problem message”, and visually flag, and report on, such values. This initiates a dialog between the study administrator and the data entry person as to the reason for this value, e.g., whether it is a data-entry or source-document error.
- Creation of a hard-copy equivalent for optional paper-based data entry.
- Context-sensitive help: script for each question to be read out during telephone interviews, and specific instructions to the data entry person. The script and instructions for each form also become part of the study’s manual of operations.
- Logging of all edits to an audit trail. Two kinds of reports are currently provided on the audit trail: details of deleted forms, or details of changed data. Both of these can be filtered by study, patient, event and form.
- The Web data entry forms are generated as Active Server Page (ASP) files to allow hand-modification – e.g., inclusion of institutional logos. However, to minimize or eliminate the need to hand-alter these forms, the study designer can customize form generation on a per-CRF basis; these preferences are stored with each CRF’s metadata, to facilitate reuse.
For example:
- The number of data entry fields per row can be specified, and subheadings inserted at particular points. Bookmarks can also be inserted so as to allow navigation to particular points in a lengthy form through a toolbar of buttons.
- The designer can customize the color scheme of a form for the page background, caption text, table and section headings.

Client-Server Operation
TrialDB operates in client-server mode, allowing access over a computer network. The server is an Oracle database. There are two types of clients 1) a PC running the Microsoft Access database (for editing the data library, which is off-limits to end-users) or 2) a machine running a Web browser (for viewing/editing data, and running reports and extracts). As TrialDB has matured, more functionality has been moved to the Web front end.
Form-Specific and Web-Enabled Help
We have created a mechanism to provide Web viewing of form-specific help from an HTML page created by the study designer for clarification on data entry issues (such as scoring, definition of terms etc.). Much of this help information already exists in electronic form (on a word processor) and data managers can readily adapt it for HTML viewing. We also have the ability to provide other study-specific help documents through the TrialDB web site.
Patient-Enterable Forms
Web-literate patients or caregivers are increasingly common users of secondary or tertiary healthcare facilities. Such individuals, who may regularly make on-line purchases or bank transactions are fully capable of entering information into electronic CRFs (e-CRFs), such as those intended for self-reporting or caregiver reporting.
For a given study, the study designer may designate certain CRFs as patient/caregiver enterable. Patient/caregivers receive a (time-limited) user-name and password, and a URL. After logging in, they are shown only a simple view of CRFs to which they have access. They can add/edit data in individual CRFs, and save them. Some CRFs are quite lengthy, and it is unlikely that all of them can be filled completely in a single session. TrialDB provides visual feedback as to which forms have been completed (after being designated by the patient/caregiver as such prior to saving). Patient-enterable e-CRFs have a number of benefits over paper forms. One can avoid re-entry of data by data management personnel, which risks introducing transcription errors.For lengthy and/or highly branching questionnaires, e-CRFs can also provide superior ergonomics, through features such as provision of defaults and "skip logic."
Scheduling and Patient Data Monitoring
TrialDB’s "patient calendar" feature assists scheduling and workload planning in prospective studies.
The calendar, in addition to creating a "to-do" list for study coordinators, also generates electronic form letters, complete with customizable institution or study-specific logos, or emails to patients if desired. A prospective study is typically divided into study periods by critical time points, or study events, whose timing is determined by the study protocol. Events are typically given meaningful labels indicating chronology and/or serialization, as well as the event's purpose, e.g., "Baseline," "Chemotherapy Cycle 1," "3-month follow-up," etc. At any given time, various patients will typically be within different study periods, e.g., one patient may be halfway through the study while another has just enrolled.
At each event, the investigator performs a predetermined set of evaluations, e.g., based on lab tests or questionnaires. The study designer in TrialDB, can set up, for each event, the list of CRFs that apply to that event along with the expected time range for the event. The following functionality then becomes available.
- Once a patient enrolls, a "calendar of events" for that patient is computed, and is used for resource and appointment scheduling.
- TrialDB generates, on demand, a "to-do" list for study coordinators, which lists all the patients to be followed up in a given date interval. For each patient, a checklist - a list of all the CRFs to be administered (which in turn imply interviews / blood draws / special investigations) is listed.
- The study designer may also designate certain study events simply for workflow purposes - these are not associated with any CRF, but trigger operations such as the printing of form letters, etc..
Exporting Data for Statistical Analysis and Reports
For statistical analysis, TrialDB "exports" its data in a format that can be input directly to statistical packages such as SAS or SPSS. This process creates the desired statistical data dictionary and makes it available to users through the Web browser. The data can also be exported to Excel or database applications such Microsoft Access or Oracle, for other types of analysis and reporting. The TrialDB application can also create a stand-alone study-specific Data Mart for researchers as a Microsoft Access database. This Data Mart can be used to create study-specific reports by investigators and researchers.The data can be periodically refreshed securely from the Web.
Support of Binary Data
TrialDB uses the EAV mechanism for managing binary data such as images, sound clips, EKGs, or large text files such as generated by proteomics experiments. The metadata captures constraints such as the type of data for a particular parameter (e.g., JPG, GIF, AVI), and the maximum permissible size of file for that parameter, to prevent malicious users from uploading arbitrarily large files or files that, for example, contain embedded script that might execute when the file is viewed in a browser by another user.
For efficiency reasons, the EAV table for binary data stores only the path of the data file (including its name and extension): the name is machine generated so as to be unique. The files themselves are stored in a virtual directory known to the Web server, which is not directly accessible to users. Relational databases are notoriously inefficient at managing binary data: a large multi-megabyte file is divided into 4-8 KB "pages", which are not necessarily contiguous on disk and have to be reassembled every time a user asks for such data. A well-optimized file system can be an order of magnitude or more faster for serving voluminous binary data. While it is true that relational databases provide a level of security that the file system does not, in our opinion this does not justify the downside of BLOB storage unless you have developed custom data types on which you can perform specific operations right within the database.
Reports to Monitor the Data Collection Process
TrialDB can generate a number of reports to help monitor and manage the data collection process. These reports include listing, for each patient in a study, the forms for which data have been entered, and the status of each form, e.g., "complete," "incomplete," "not available," etc. (Depending on the nature of a study, there are a variety of ways to determine the status of a form.)
Importing Electronic Data
TrialDB has tools to facilitate the automatic importing of data that are already in electronic form (such as laboratory test results) directly into its database without the need for retyping. To use this capability, a TrialDB study administrator first sets up an “import mapping” between input data elements and the attributes in one or more study-specific CRFs. These mappings are set up using a point-and-click interface, and saved for later reuse. The mappings can also contain transformations of items, e.g., based on choice sets from one set of numerical codes to another. The mapping interface also includes facilities to transform data from a de-normalized structure to the normalized structure required by TrialDB. For example, legacy flat files often contain non-normalized data where multiple, semantically identical fields are distinguished by numeric suffixes to distinguish consecutive instances, e.g., "sodium_1," "sodium_2," "sodium_3." etc.
Software in TrialDB automatically processes an automated data feed from the Yale New Haven Hospital clinical laboratory system.This is accomplished through an HL-7 parsing module that also uses the Logical Observation Identifiers, Names & Codes (LOINC) controlled vocabulary.
Security and Privacy
TrialDB implements a variety of security features at several different levels, including the following.
- The Oracle database provides a number of security features including user ID and password-protected login. The TrialDB web-site requires users to logon with a user ID and password as well as use 128-bit SSL security to encrypt all Web-based communications. The individual web-pages for TrialDB have been secured through the use of a security token that prevents someone from bookmarking this page and jumping directly to it later.
- TrialDB itself enforces “role-based” user privileges. The person in charge of each study can specify, for example, that a particular person is allowed 1) “read only” access to that study’s data, 2) “read/edit” access, 3) the ability to create/modify the protocol’s data element definitions and screens, and/or 4) the ability to assign privileges.
- Privileges can be “site-restricted” so that investigators and data managers at a site can see only information about patients at their site for a given trial.
- Privileges can be given to users at the level of individual CRFs.
- Patients may be segregated into “patient pools,” each of which can be accessed only by certain groups of users. Using this feature, for example, no information at all, including names and demographics, about psychiatric patients can be seen by any other users of TrialDB.
Use of Standardized Clinical Vocabularies
TrialDB allows multiple standardized clinical vocabularies (such as ICD-10, DSM-IV and the COSTART vocabulary for adverse drug reactions) to be incorporated and used in the data entry process.
You don't need to hand-code access to a particular vocabulary - you simply specify, in the data library, which vocabulary table (and which column in the table) a particular data element is based on, and which column in the table should be used for search. Generally, the value that is stored is a ID that is not meaningful to non-experts: the English-language description is more useful here. If you wish, you can store both the ID field as well as the description field in two elements.
Electronic "Sticky Notes" Facilitate Multiple-Site Data Collection
Data managers at a central site and at peripheral sites can communicate with each other regarding the data being collected by attaching electronic "sticky notes" to any data element, and by responding to sticky notes posted by others. TrialDB refers to these as Problem Messages, since their purpose is almost always to indicate specific problems with individual data items.
Electronic Audit Trail
TrialDB can maintain an audit trail of all modifications to a study’s data, including the user who made the change, the date and time, and each data item that was changed together with its previous value and new value.
Drawing of Active Pedigree Diagrams
We have enhanced TrialDB's data structure to store pedigree-specific and non-study-specific data (Fernando et al, 2001).
This feature is often needed by genetic epidemiology studies. A pedigree diagram is a picture of the relationships between members of an extended family, or pedigree, that shows not only how persons are related to each other (by marriage or kinship), but also displays, in summary form, the disease traits of interest for each persons. Up to four traits (in four quadrants of an icon representing a person) can be shown simultaneously. In general, you may have several hundred parameters of interest per person, so you will be allowed to specify which four parameters should be displayed. Obviously, the choice of parameters can be changed.
Within the data library, information is now stored to allow the pedigree diagram to be generated dynamically from the database's contents and displayed on the web.The technology we use is Scalable Vector Graphics (SVG), an XML-based technology that can achieve dramatic bandwidth savings compared to sending GIFs or JPEGs over the Web. The icons representing each subject in the pedigree are selectable hyperlinks that will display nutshell clinical data collected on the subject when clicked. This capability was developed in our work with the NCI Cancer Genetics Network, but is available for use by the CRCs for genetic studies. You can also navigate between an individual's nutshell data and the detailed clinical data, by making that individual the "current" or "chosen" one.
Quality Control
In addition to the built-in (code-based) checks in individual forms for intra-field and cross-field validation, data quality control (QC) is done through either random audits.
In situations where data is transcribed into TrialDB from other sources (typically paper forms) it is important to ensure the quality of data. While the checks you place through data type and range checking, skip logic and intra-form validation prevent obviously incorrect entries, one must also guard against feasible but erroneous entries. Random Audits of the electronic data against the source documents (e.g., the clinical charts) ensure quality. They are based on the methods of W. Edwards Deming, the New York University-based quality-control guru whose methods were successfully used by Japanese manufacturing to overtake the US in the 60s and 70s.
The procedure that is followed is for the data management/investigator team to assign each category of case report form (CRF) a level of criticality with respect to adverse consequences if wrong data were entered. For each category, a proportion of the CRFs (which depends on this level) is sampled by a random-sample-generating program and compared with the source documents for discrepancies. (For critical CRFs whose contents determine, for example, whether or not to escalate the dose of medication – as in the cancer chemotherapy trials for which TrialDB is routinely used at Yale, where incorrect escalation may have fatal consequences for a patient – the fraction of sampled CRF may be 100% at the commencement of the study – i.e., -EVERY- CRF instance within a category may be audited.)
- The number of discrepancies is reported by site and within site by data-entry-person to facilitate corrective action.
- Based on the detected discrepancy rate, the proportion of audited CRFs for each category may be gradually reduced over time, though obviously it never drops to zero for even the least critical category.

Patient Safety: Adverse Event Reporting
TrialDB was originally developed for use in cancer studies, where chemotherapy adverse effects are defined by the NCI's Common Toxicity Criteria (CTC) as a controlled vocabulary.In CTC, every adverse effect is given a grade from 0 (none) to 5 (death due to toxicity).It is an anchored scale, in that the criteria for assigning a particular grade for a particular adverse event (AE) are clearly specified. (Thus, grade 2 anorexia is defined as "oral intake significantly decreased".) Adverse events of grade 3 and above are reportable to the FDA.
We have implemented a number of enhanced features to help when a user accesses TrialDB's CRF for CTC AE recording. For example, when the user has chosen a particular AE (e.g., anorexia) and clicks a button that asks for toxicity grade, the anchored choices for that AE are fetched from the database and displayed in a pop-up window. (There are at least 1,500 different AEs in CTC, so one cannot use the "pull-down" metaphor that is usual for choice sets.)
TrialDB also produces several adverse event reports for safety data monitoring. One report, for example, lists counts of all adverse events that were recorded at least once (organized by event category), and with the maximum grade for a given patient being recorded. Another report lists details of all reportable AEs that occurred in the study within a date/time interval.
The reporting mechanism can be readily modified to use an alternative controlled vocabulary of AEs, and adapted to standard operating procedures (SOPs) specific to individual studies. These may dictate which AEs get reported to a site's Institutional Review Board and/or to other monitoring bodies.We have implemented a mechanism to E-mail the "reportable AEs" and messages to individuals on a list such as IRB personnel and the study PI (as described below).
Notification Messaging System
TrialDB has the ability to create a patient-specific customized message sent to members of a study. This is particularly useful in the case of adverse events, where the data manager can email all members of the team the message to look at the Adverse event form for a specific patient and date.
Data Warehouse for Reporting and Extracts
We have created a data warehouse to store a copy of all data in TrialDB (updated nightly) in a format that facilitates ad-hoc query and the generation of reports.The creation of such a data warehouse copy of a clinical database is a common solution used to allow production of diverse reports from a system like TrialDB which is optimized for the flexibility of its data library and for patient-specific access.In contrast, the data warehouse is optimized for cross-patient analysis and reporting. A particular emphasis has been for reports that are accessible to any CRC staff member using a Web browser (with appropriate confidentiality and security safeguards, as described previously).
Metadata Management System (MMS)
We have recently completed the design of a pilot metadata management system (MMS) that we are in the process of testing and implementing. This requires the creation of use cases and rules about TrialDB metadata changes. Metadata in TrialDB includes all the information created about a study such as the definitions of the case report form and their questions, the inclusion exclusion, and the study events (or time periods) defined. When changes are made to this information, the MMS will automatically ensure the correct actions are taken to keep the data and metadata synchronized. Also, the MMS will store all changes in metadata into metadata audit tables. This system also provides the mechanism for the creation and electronic storage of explicit CRF version information.
TrialDB OnLine Help
The content of the TrialDB Help file, which includes the Designer Guide and Developer Guide, is available either as a downloadable compiled HTML (CHM) file or a set of pages that can be browsed over the Web. (Both were generated using Adobe RoboHelp, which is a significant productivity enhancer. The latest version of RoboHelp supports integration of the documentation with a source code control system, thereby supporting team development.
Download .CHM File

|