SEURAT: Scanned Entry of Structured Data
for a Pediatric Health Maintenance Record System

Richard N. Shiffman, MD, MCIS1,2, Cynthia A. Brandt, MD, MPH1,
Maryrose Hoffman, RN, MS3, Wanda Wiig3, Lori A. Fernandes, MD1
1
Center for Medical Informatics, 2Department of Pediatrics, Yale School of Medicine
3Management Information Systems, Yale-New Haven Hospital
New Haven, Connecticut

Capturing clinical data in digital format has been a fundamental challenge in informatics. We applied electronic document scanning and optical character recognition, marksense recognition, and handwriting recognition technologies to capture structured pediatric health maintenance data at the Yale New Haven Hospital Primary Care Center. We devised paper-based forms to permit direct documentation by health care providers with minimal disruption of established practice. An essential dataset was defined and a model was created to accommodate information collected locally, as well as data derived from external sources. Software was developed to automatically convert information extracted from 16 different forms into relational format. Since January 1, 1997, the system has captured clinical data from all pediatric health maintenance encounters. The system has been well accepted and is quite accurate. [Keywords: scanning, computer-based patient record, database design, pediatric health maintenance]

A fundamental challenge in the implementation of computer-based patient records has been acquisition of clinical information in a format that can be processed by computers . Capturing data directly from the provider has proven particularly problematic. Many physicians have resisted use of computers because of a requirement for typing on a keyboard. Barnett and colleagues noted that "the design of interfaces for direct entry of clinical information by physicians remains one of the most difficult challenges in the development of a computer-based system" .

In our SEURAT Project (Scanning for Evaluation, Utilization Review, Analysis, and Training), we aimed to collect pediatric health maintenance data recorded by the provider at the point of care using electronic document scanning and recognition. Our objectives were to:

• Facilitate efficient, legible, and complete docu- mentation of encounters;
• Document compliance with Medicaid’s Early and Periodic Screening, Diagnosis, and Treatment Program (EPSDT);
• Simplify quality assessment and reporting and identify areas requiring quality improvement;
• Enhance compliance with health maintenance guidelines by providing reminders of age-appropriate preventive care at the time and place of a consultation;
• Promote resident and medical student learning of well child care; and
• Minimally disrupt established practice.

CLINICAL DATA REQUIREMENTS

As many as 50 children are seen daily in the Yale-New Haven Hospital Primary Care Center Pediatric Clinic for health maintenance visits. Most of these patients come from the inner city; they represent a predominantly poor, multi-ethnic population, whose health care is most often supported by Medicaid. Approximately 70 clinicians—including residents, attending physicians, and nurse practitioners—provide care. Although financial and appointment scheduling functions are supported by computers, there has been no outpatient, computer-based clinical record in our Primary Care Center.

Initiated in October 1995, Connecticut Access—the statewide Medicaid managed care program—brought extensive requirements for clinical data collection and reporting. It mandated tracking of a large number of indicators of system utilization, clinical quality, and health status and specified the generation of regular, detailed reports.

We identified an essential dataset of health maintenance activities that are performed in caring for children from birth through adolescence in the Pediatric Primary Care Center. These data pertain to growth, nutrition, development, sleep, elimination, illness, safety, anticipatory guidance, physical examination findings, screening tests, immunizations, and demographics. We anticipated a need for considerable flexibility for our data capture system, which would permit a rapid response to changing data collection and reporting requirements.

DESIGN OF THE SCANNING SYSTEM

Electronic document scanning can capture patient data in a digital format, while retaining the familiarity of paper-based recording for the clinicians. Scanning of documents with storage and display of digital images of the paper record has been implemented in some centers. This approach can improve access to clinical information and maintain the expressivity of the paper-based record. However, the information in the record cannot be efficiently searched, aggregated, or used to trigger decision support and illegibility is often a problem in the scanned images.

Document scanning can also be coupled with several recognition technologies to acquire information in a structured format for the electronic record. Optical mark recognition (OMR or mark sense recognition) allows capture of data recorded as dark marks in specifically designated areas. OMR is highly accurate and is familiar to users because of its extensive use in standardized tests. Likewise, barcode recognition (BCR) allows precise capture of data, which has been encoded in the widths of bars and spaces. Unfortunately, barcoded data must be machine printed and the patterns are unintelligible to users. Optical character recognition (OCR) can capture machine-printed alpha-numeric characters with excellent fidelity. Intelligent character recognition (ICR or handwriting recognition) provides the least accurate but the most flexible mode of recording data.

In selecting a scanning solution for our data acquisition problems, a high priority was placed on an ability to maintain local control over form design and generation. We also considered cost, recognition capabilities, compatibility with enterprise databases, and hardware and operating system requirements. We selected Teleform software (Cardiff Software, San Marcos, Calif.)—a product that includes a design module for form generation and recognition engines for data presented in machine print, marksense, barcodes, and handwriting. A verification module automatically displays and highlights unrecognized or invalid entries for correction processing. The software also includes a fully capable scripting language that enables complex data integrity checks.

Forms Design

Until January 1996, encounter documentation for health maintenance visits in the Primary Care Center was handwritten on lined, 2-part, carbon-interleaved forms by nurses, residents and attending physicians. In response to managed care requirements and in anticipation of the installation of an ambulatory information system, this system has been replaced by structured encounter forms.

Sixteen forms were designed to permit scanning of structured information. Each form occupies one page and prompts for information entry that is relevant to a particular age range. The new forms have been well accepted and have resulted in improved documentation [3].

We designed the forms and the data capture process to accommodate flexible updates, deletions and additions to the form content. We limited the number of field types on the forms. All assessment data fields are designed so that a given field can contain at most a single value. Negative and uncertain statements are accepted [4]. In general, the provider is offered 3 values, e.g., "normal", "abnormal", and "query" or "yes", "no", and "query". A default value indicates that an assessment of a particular item was not performed. As distinguished from assessments, interventions are simply performed or not performed.

Data Modeling

We identified several major entities to track in our health maintenance database. Patients and Providers interact at Encounters. During an Encounter, multiple assessments and interventions may occur, which are designated as Events. Events include such activities as evaluation of nutrition and development, measurements of height, weight, and blood pressure, counseling, laboratory screenings, and immunizations.

As distinguished from inpatient data—where an episode of care is largely self-contained—record systems that serve the ambulatory environment must integrate information that may be collected in multiple formats from a great variety of sources. Because the ambulatory record covers long periods of time, children see multiple providers. Key data—such as the exact date, the site, or the provider of an immunization—may be unavailable.

To accommodate activities that may have been performed at our Clinic site or elsewhere, we defined four mutually exclusive health maintenance Event categories that inherit an event identifier and event value and may specialize additional attributes. Each of these categories is a subtype of the Event supertype.

Encounter Events occur only in our clinic with clearly defined providers, at definite times, and in the context of specific visit types. They include such activities as nutritional assessment (e.g., type of formula, adequacy of nutrient balance), developmental assessments (e.g., child can or cannot walk well), anticipatory guidance (e.g., counseling regarding appropriate toilet training, car seat safety) and physical examination findings.

Screening Events may occur at our Clinic or elsewhere and include such activities as tuberculin testing, lead screening, and anemia checks. These events have attributes that describe whether they were ordered, performed, or both, and they have results, measurement units, and normal ranges.

Measurement Events may occur at our Clinic or elsewhere and include height, weight, head circumference, and blood pressure measurements. These events specialize measurement units.

Immunization Events may occur at our Clinic or elsewhere and include multiple antigen combinations, sites of administration, manufacturer's lot numbers, etc. However, the provider, setting, and time of the immunization may be unknown.

In our design, we linked Measurement, Screening, and Immunization Events to both the Encounter entity and to the Patient entity through foreign keys. This allows us to store clinical information that was observed during an Encounter at our site, as well as data collected elsewhere. Encounter events are linked only to the Encounter entity.

In addition, providers complete a comprehensive perinatal history form at the first Clinic visit of a newborn. This information is stored in a Perinatal History table that is linked to the Patient table.

In assembling a data dictionary, we found that many pediatric health maintenance terms—particularly for developmental attainments and anticipatory guidance—are not included in current standardized vocabularies such as SNOMED. We therefore assigned unique numeric identifiers to our data and included synonym fields in the data dictionary to accommodate standardized vocabularies where they exist.

Converting to Relational Format

The verified output from the scanner software is exported as a flat file. Reflecting the content of the corresponding age-specific form, each file structure is unique. For example, the 4 Month Visit form contains data elements related to formula type and ability to roll from prone to supine, while the 6 thru 10 Year Old Visit form includes data elements that deal with school performance and bicycle safety.

We devised software to parse the flat file records, perform additional validation checks, and enter the scanned data into appropriate relational tables in the SEURAT database. We assigned the item identifier from our data dictionary as the field name for the exported fields. The item identifier is a composite of an event type and a unique numeric identifier from the data dictionary. For example, VX85 represents "DTaP vaccine" which is parsed to the Vaccine Event table and EX372 represents "counseling about smoke detectors", which resides in the Encounter Event table. Each data item in the exported flat record is checked sequentially and routed to the appropriate SEURAT table based on the item identifier. This design permits flexible addition, deletion, and modification of fields on any of the forms without requiring modification of the conversion software.

Alternatively, health maintenance data—which are collected outside the setting of a Yale Clinic encounter— can be added directly to the database from one of several on-screen forms. Particularly useful is a multi-pick immunization form that facilitates entry of multiple immunizations that occur on a single administration date.

SYSTEM IMPLEMENTATION

For each health maintenance visit, a member of the ambulatory nursing staff accesses the patient’s registration information on the enterprise mainframe IDMS database via a clinic workstation running terminal emulation software (EXTRA! for Windows, Attachmate, Bellevue, Wash.) (See Figure 1). After completing a nursing evaluation, s/he enters the patient’s measurements (e.g., height, weight, blood pressure) and approves an age-appropriate form (suggested by the computer) for printing. A patient-specific form is generated that merges the patient demographics information with today’s measurements along with a form template that includes age-appropriate history and physical items, anticipatory guidance, screening and immunization recommendations. To speed processing, images of all 16 forms reside on an electronically re-programmable MIPS chip (Dataline America, San Diego, Calif.) in the workstation printer.

The nurse adds handwritten notes and recommendations to the form at this time and attaches it to the patient’s chart. The treating physician completes the marksense areas of the form and writes narrative notes in the whitespace. The clinician’s handwritten identifier number in a constrained print field identifies the provider. The form is collected for batch scanning.

The scanning software performs data validation to assure the integrity of critical information. Working at a verification station with a large-screen monitor, the verifier can view a scanned image of the paper form and edit any data items that have been flagged by the verification software. Approximately 50 forms are processed each weekday.

The output of the scanning process is a separate flat file for each form type, which contains verified encounter data. Next, the conversion software parses and inserts these data into appropriate tables in a Microsoft Access database (Redmond, Wash.), which reside on a networked server.

The SEURAT database provides reports for each patient that show visits by date and providers, a child’s complete immunization, growth, and screening histories, as well as detailed summaries of individual encounters. In addition, reports of aggregate activities can be generated to show immunizations for a cohort of patients, provider productivity, and monthly vaccine usage.

Evaluation

The structured forms replaced the legacy system in February of 1996. Evaluation of the transition to structured forms showed an almost three-fold improvement in the documentation of data items relating to development and anticipatory guidance compared with the legacy system . In addition, 90% of the users expressed a preference for the structured forms. Users agreed most strongly that use of structured forms served to remind them to provide anticipatory guidance and to perform assessments and interventions they might otherwise have forgotten.

Since January 1, 1997, the structured documentation from all health maintenance encounters at our Clinic has been scanned; the data have been verified and entered into the SEURAT database.

Scanning accuracy is adversely affected by false positives—those marksense areas incorrectly recognized as filled—and false negatives—those filled marksense areas not recognized by the scanner system. We defined the marksense scanning accuracy as the proportion of the sum of true positives and true negatives to the total number of marksense areas. In an analysis—performed under clinical conditions—of 5,185 marksense areas, the scanning accuracy rate was 99.17%.

DISCUSSION

Document scanning has been reported in a limited number of clinical applications. Moran et al. used a scannable encounter form to capture visit types, diagnoses, procedures, and screening data for an adult population attending an urban community health center . Guerette et al. used scanned forms to update centralized databases with records obtained from remote sites . Dayhoff and colleagues have used scanning as one of several methods for data capture for clinical workstations in the Veterans Administration system . Downes et al. have devised a system that generates a patient-specific, scannable worksheet based on a prioritized list of preventive services for children .

The SEURAT system is unique in that it captures a comprehensive view of the pediatric health maintenance encounter (including clinical observations, assessments, and interventions) in structured format from scanned forms. An average of 80 data items are collected from each form. Optical mark recognition, optical character recognition, and intelligent character recognition are all applied to each of our forms to extract clinical data, demographics, and provider identifiers, respectively. Minimal initial training is required for the large number of providers rotating through our Clinic because of the intuitive method of documentation. Our conversion software automatically integrates information scanned from multiple form types.

Limitations of our scanning approach include:

• As in any structured system, we are limited in our ability to capture the richness of an encounter with a finite number of data items. Some important information, such as a patient problem list, does not lend itself to scannable data capture.
• The centralized scanning system isolates the clinician from direct interaction with the database at the point of care. This limits the system’s ability to provide immediate, patient-specific reminders and feedback.
• The system requires ongoing feedback to clinicians regarding use of appropriate forms, avoiding stray marks on the forms, and correctly filling marksense bubbles.

On the other hand, we have demonstrated that it is possible to capture a wealth of clinical information using familiar paper-and-pen documentation with minimal disruption of clinical workflow. The structured format of the age-specific forms provides reminders for our clinicians. All advantages of the electronic medical record become available, including information access, legibility, organization, security, and aggregate reporting. Additionally, we have been able to maintain local control over form design and generation. Just as the Pointillist painter Georges Seurat deconstructed reality into individual dots of light and color, our SEURAT system effectively captures clinical data expressed as patterns of marks. Current scanning technology can serve as a viable mechanism for the capture of clinical information recorded by clinicians at the point of care.

ACKNOWLEDGEMENTS

The authors wish to express their gratitude to Judy Doyle, Art Dufresne, Marci Durso, Lori Fernandes, Elizabeth Frugale, Laura Johnson-Oates, Pam Mendillo, Nancy Norko, Sandy Elkin Randi, and Josephine Wheaton for helping to bring this project to fruition. This work was supported in part by grants 1-R55-LM05552-01A1 and T15-LM07056 from the National Library of Medicine. Dr. Shiffman is a Robert Wood Johnson Foundation Generalist Physician Faculty Scholar.

REFERENCES

  1.     Committee on Improving the Patient Record.  The computer-based patient record: an essential technology for health-care.  Washington, D.C.: National Academy Press, 1991.
  2.     Barnett GO, Jenders RA, Chueh HC.  The computer-based clinical record--where do we stand?  Ann Intern Med 1993; 119:1046-8.
  3.     Shiffman RN, Brandt CA, Freeman BG.  Transition to a computer-based ambulatory record using scannable structured encounter forms.  1997 (submitted)
  4.     Rector AL, Nowlan WA, Kay S, et al.  Foundations for an electronic medical record.  Methods of Information in Medicine, 1991; 30:179-86.
  5.     Moran WP, Wofford JL, Hamrick V, et al.  Implementing computerized tracking at a community health center: challenges and solutions.  In: Ozbolt JG, ed. Proceedings of the 18th Annual Symposium on Computer Applications in Medical Care.  Washington, D.C.: Hanley and Belfus, 1994: 139-143.
  6.     Guerette P, Robinson B, Moran WP, et al.  Teleform(TM) scannable data-entry: an efficient method to update a community-based medical record?   In: Gardner RM, ed.  Proceedings of the 19th Annual Symposium on Computer Applications in Medical Care.  New Orleans, LA: Hanley and Belfus, 1995: 86-90.
  7.     Dayhoff R, Kirin G, Pollock S, Miller C, Todd S.  Meidcal data capture and display: the importance of clinicians' workstation design.    In: Ozbolt JG, ed. Proceedings of the 18th Annual Symposium on Computer Applications in Medical Care.  Washington, D.C.: Hanley and Belfus, 1994: 541-5.
  8.     Downs SM, Arbanas JM, Cohen LR.  Computer supported preventive services for children:  the C.H.I.P. project.  In: Gardner RM, ed.   Proceedings of the 19th Annual Symposium on Computer Applications in Medical Care.   New Orleans, LA: Hanley and Belfus, 1995: 962.