Prakash Nadkarni, M.D.

Adjunct Faculty, Yale Center for Medical Informatics

Research Interests Presentations Selected Publications Database Course (BIS560) Materials Microsoft Access Tutorial

Email: prakash (dot) nadkarni at

Research Interests

My hobby (which also happens to be my job) is working with biomedical databases of all kinds.

I have an abiding interest in Metadata-driven Software Systems, which exemplify creative laziness - that is, doing things right the first time so that you don't have to tweak your software over and over again. A family of databases that rely critically on metadata are Entity-Attribute-Value (EAV) databases, which are used in domains where the number of potential descriptors (attributes) describing an object is a couple of orders of magnitude greater than the actual number of descriptors for a given object. For example, when dealing with patient data across all clinical specialties, the number of history elements,symptoms, clinical examination findings, lab tests and so on ranges in several tens of thousands, and this number is constantly growing. Yet, for a given patient, not more than a few dozen types of positive or significant negative findings are actually relevant. That is, the data is highly sparse, and a set of conventional relational tables, with one finding per column, would result in much wasted space, because most columns would be null. In the EAV approach, one stores only non-null findings in a table containing three types of information: the Entity (the patient, the date/time the finding was recorded), the Attribute (i.e., the name of the finding) and the Value of the finding.

(Shameless Plug: My 2011 book on metadata-driven systems in Biomedicine, published by Springer, is available on Amazon, with excerpts also available on Google Books.)

TrialDB, an EAV database for management of Clinical Studies Data that is copyrighted by myself and my colleagues Cindy Brandt and Luis Marenco (though it is open-source freeware), is described on the TrialDB Home Page. This page has an FAQ, and links to the ftp site, online documentation (also ftp-able) and the demo site where you can try it out..

I also dabble in information retrieval (a fancy phase for text processing). This is an offshoot of my database interests: a large component of biomedical databases consists of narrative text (which captures nuances that coded text cannot): examples are discharge summaries and operative notes. I'm looking at ways to optimize the searching process by indexing the content based on recognition of concepts in controlled biomedical vocabularies (I mainly play with the National Library of Medicine's Unified Medical Language System), and at ways of integrating text search with conventional database search. I've begun to explore Natural Language Processing and (as an offshoot of NLP) machine-learning techniques.

In the past, I've worked in the area of genome informatics as well as parallel computation in molecular biology and genetics.


The following links point to the contents of presentations that should be of general interest to medical informaticians.

  1. Clinical Data Warehousing  presented at AMIA Fall Symposium, Orlando, FL, Nov 8 1998
  2. ACT/DB: An Infrastructure for Clinical Trials Data Management Columbia University, Jan 21 1999
  3. The EAV/CR Physical Data Model for Heterogeneous Scientific Databases   Human Brain Project Annual Meeting, NIH, Jun 5,1999
  4. Understanding and Implementing the EAV Database in the General Clinical Research Center. National GCRC Meeting, Baltimore, MD, April 13, 2002. The URL above is a converted PowerPoint presentation. A detailed explanatory paper can be found below.
  5. An Introduction to EAV systems: National GCRC Meeting, Baltimore, MD, April 13,2002.
  6. Informatics Support of Data management for multi-centric clinical studies: Integrating clinical and genetics/genomic data American College of Medical Informatics, Fort Lauderdale, Florida, March 2003.
  7. Database Representation of Phenotype Data: Issues and Challenges   Human Genome Variation Society, American Society for Human Genetics Meeting, Los Angeles, CA , November 4, 2003.
  8. Metadata-driven systems in Biomedicine. Henry Ford Health System, Detroit, 2008
  9. Implementing Clinical Decision Support (With Hemant Shah, MD). Henry Ford Health System, Detroit, 2009.
  10. Using Electronic medical records for clinical research: Challenges University of Iowa, Sept 2012..
  11. Using Electronic medical records for Research II: Practical Issues and Implementation Hurdles University of Iowa, Sept 2013..

Selected Publications:

Here is a list of selected recent publications. (Some of the papers are downloadable as MS-Word files plus figures, compressed into zip files. See the hyperlinks at the bottom of the publications list..)

  1. Nadkarni PM. Management of Evolving Map Data: Data Structures and Algorithms Based on the Framework Map Genomics, (1995) 30:565-573. Abstract
  2. Nadkarni PM, Cheung K-H. SQLGEN: An environment for rapid client-server database development. Computers and Biomedical Research (1995) 28:479-499. Abstract
  3. Nadkarni PM, Montgomery KM, Leblanc-Stracewski J, Krauter K. CONTIG EXPLORER: Interactive Exploratory Contig Assembly. Genomics (1996) 31:301-310. Abstract
  4. Nadkarni PM, Cheung K-H, Castiglione C, Miller PL, Kidd KK. DNA Workbench: a database for support of regional chromosomal mapping. Journal of Computational Biology, (1996) 3 (2), 319-329. Abstract
  5. Nadkarni PM. Mapdiff: an algorithm to report the differences between two genomic maps. Comput Applic Biosci, (1997) 13 (3) 217 - 225. Abstract
  6. Cheung, K-H, Nadkarni PM, Silverstein S, Miller PL, Kidd KK. PhenoDB: A database for the storage and analysis of pedigree and population genetic data. Computers in Biomedical Research (1996) 79: 327-337. Abstract
  7. Nadkarni PM. Concept Locator: A Client-Server Application for Retrieval of UMLS Metathesaurus Concepts Through Complex Boolean Query. Computers in Biomedical Research (1997) 30:323-336. Abstract
  8. Nadkarni PM. QAV: Querying Entity-Attribute Value Metadata in a Biomedical Database.Computer Methods and Programs in Biomedicine (1997) 53 93-103. Abstract
  9. Nadkarni PM. Mapmerge: merging genomic maps. Bioinformatics (1998) 14(4), 310-316. Abstract
  10. Nadkarni, P, Brandt, C, Frawley, S M, Sayward, F, Einbinder, R, Zelterman, D, Schacter, L Miller, P L. Managing attribute-value clinical trials data using the ACT/DB client-server database system. Journal of the American Medical Informatics Association (1998) 5(2) 139-151. Abstract
  11. Cheung, KH, Nadkarni PM, Shin DG. A metadata approach to query interoperation between molecular biology databases. Bioinformatics (1998) 14(6) 486-497. Abstract
  12. Nadkarni PM. CHRONOMERGE: An Application for the Merging and Display of Multiple Time-Stamped Data Streams. Computers and Biomedical Research (1998) 31 451-464. Abstract
  13. Nadkarni, P, Brandt, C. Data Extraction and Ad Hoc Query of an Entity-Attribute-Value Database. Journal of the American Medical Informatics Association (1998) 5(6) 511-527. Abstract
  14. Nadkarni PM, Marenco L, Chen R, Skoufos E, Shepherd G, Miller P. Organization of heterogeneous scientific data using the EAV/CR representation.Journal of the American Medical Informatics Association 1999 Nov-Dec;6(6):478-93. Abstract
  15. Stein HD, Nadkarni P, Erdos J, Miller PL Exploring the degree of concordance of coded and textual data in answering clinical queries from a clinical data repository. J Am Med Inform Assoc 2000 Jan-Feb;7(1):42-54. Abstract
  16. Nadkarni PM, Brandt C, Marenco L. WebEAV: Automatic Metadata-Driven Generation of Web Interfaces to Entity-Attribute-Value Databases. J Am Med Inform Assoc 2000;7(4):343-56 Abstract
  17. Roland S. Chen, Prakash Nadkarni , Luis Marenco, Forrest Levin , Joseph Erdos and Perry L. Miller.Exploring Performance Issues for a Clinical Database Organized Using an Entity-Attribute-Value Representation. J Am Med Inform Assoc 2000; 7(5):475-487 Abstract
  18. Chen RS and Brandt CA. UMLS Concept Indexing for Production Databases: A Feasibility Study. Journal of American Medical Informatics Association, 2001 8: 80-91 Abstract
  19. Mutalik PG, Deshpande AM, Nadkarni PM. Use of General-purpose Negation Detection to Augment Concept Indexing of Medical Documents: A Quantitative Study using the UMLS. Journal of American Medical Informatics Association, 2001 8: 598-609 Abstract
  20. Nadkarni PM: An Introduction to Information Retrieval: Applications in Genomics. The Pharmacogenomics Journal, 2002 2(2) 96-102. Abstract
  21. Deshpande AM, Brandt CA, Nadkarni PM. Metadata-driven Ad hoc Query of Patient Data: Meeting the Needs of Clinical Studies. Journal of the American Medical Informatics Association. 2002; 9(4) 369-382. Abstract
  22. Fisk JM, Mutalik PG, Levin FW, Erdos J, Taylor C, Nadkarni PM. Integrating Query of Relational and Textual Data in Clinical Databases: a Case Study. Journal of the American Medical Informatics Association. 2003: 10(1) 21-38..Abstract
  23. Nadkarni PM, Sun K, Wiepert M. Designing and Implementing Special-Purpose Databases: Lessons from the Pharmacogenetic Network. Pharmacogenomics 2002 3(5): 687-96. Abstract
  24. Nadkarni PM. The challenges of recording phenotype in a generalizable, computable form (Perspective). The Pharmacogenomics Journal 2003 3(1) 8-10.
  25. Deshpande AM, Brandt CA, Nadkarni PM. Temporal Query of Attribute-Value Patient Data: Utilizing the Constraints of Clinical Studies. International Journal of Medical Informatics 2003; 70 (1): 59-77. Abstract
  26. Marenco L, Tosches N, Crasto C, Shepherd G, Miller PL, Nadkarni PM Achieving Evolvable Web-Database Bioscience Applications Using the EAV/CR Framework: Recent Advances. Journal of the American Medical Informatics Association. 2003: 10(5). Abstract
  27. Brandt CA, Gadagkar R, Rodriguez C, Nadkarni PM. Managing Complex Change in Clinical Study Metadata. Journal of the American Medical Informatics Association. 2004 11(3) 380-91. Abstract
  28. Holford M, Li N, Nadkarni P, Zhao H. VitaPad: visualization tools for analysis of pathway data. Bioinformatics: 2004, 21 (8) 1596-1602. Abstract
  29. Marenco L, Wang TY , Shepherd G, Miller PL, Nadkarni PM QIS: A Framework for Biomedical Database Federation. Journal of the American Medical Informatics Association. 2004 11(6) 523-34.. Abstract
  30. Nadkarni PM and Brandt CA. The Common Data Elements for Cancer Research: Remarks on Functions and Structure. Methods of Information in Medicine, 2006;45(6):594-601. Abstract
  31. Nadkarni PM, Wiepert M. Translating pharmacogenomics discoveries into clinical practice: the role of curated databases. Pharmacogenomics. 2005 Jul;6(5):451-4. Pubmed (free article)
  32. Dinu V, Nadkarni P. Guidelines for the effective use of entity-attribute-value modeling for biomedical databases. Int J Med Inform. 2007 Nov-Dec;76(11-12):769-79. Pubmed (free article)
  33. Nadkarni PM, Miller RA. Service-oriented architecture in medical software: promises and perils. J Am Med Inform Assoc. 2007 Mar-Apr;14(2):244-6. Pubmed (free article)
  34. Marenco L, Wang R, Nadkarni P. Automated database mediation using ontological metadata mappings. J Am Med Inform Assoc. 2009 Sep-Oct;16(5):723-37. Pubmed (free article)
  35. Nadkarni PM, Marenco LN. Implementing description-logic rules for SNOMED-CT attributes through a table-driven approach. J Am Med Inform Assoc. 2010 Mar-Apr;17(2):182-4. Pubmed (free article)
  36. Nadkarni PM, Marenco LN. Implementing description-logic rules for SNOMED-CT attributes through a table-driven approach. J Am Med Inform Assoc. 2010 Mar-Apr;17(2):182-4. Pubmed (free article)
  37. Nadkarni PM, Darer JD.Migrating existing clinical content from ICD-9 to SNOMED. J Am Med Inform Assoc. 2010 Sep-Oct;17(5):602-7. Pubmed (free article)
  38. Nadkarni PM. Drug safety surveillance using de-identified EMR and claims data: issues and challenges. J Am Med Inform Assoc. 2010 Nov-Dec;17(6):671-4. Pubmed (free article)
  39. Nadkarni PM, Darer JD. Determining correspondences between high-frequency MedDRA concepts and SNOMED: a case study. BMC Med Inform Decis Mak. 2010 Pubmed (free article)
  40. Richesson RL, Nadkarni P. Data standards for clinical research data collection forms: current status and challenges. J Am Med Inform Assoc. 2011 May 1;18(3):341-6. Pubmed (free article)
  41. Chapman WW, Nadkarni PM, Hirschman L, D'Avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc. 2011 Sep-Oct;18(5):540-3. Pubmed (free article)
  42. Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. J Am Med Inform Assoc. 2011 Sep-Oct;18(5):544-51. Pubmed (free article)
  43. Nadkarni PM, Kemp R, Parikh CR. Leveraging a clinical research information system to assist biospecimen data and workflow management: a hybrid approach. J Clin Bioinforma. 2011 Aug 25;1:22. Pubmed (free article)
  44. Morse RE, Nadkarni P, Schoenfeld DA, Finkelstein DM. Web-browser encryption of personal health information. BMC Med Inform Decis Mak. 2011 Nov 10;11:70. Pubmed (free article)
  45. Shah H, Allard RD, Enberg R, Krishnan G, Williams P, Nadkarni PM. Requirements for guidelines systems: implementation challenges and lessons from existing software-engineering efforts. BMC Med Inform Decis Mak. 2012 Mar 9;12:16. Pubmed (free article)
  46. Nadkarni PM, Parikh CR. An eUtils toolset and its use for creating a pipeline to link genomics and proteomics analyses to domain-specific biomedical literature. Pubmed (free article)

Book Chapters

Shepherd, G.M., M.D. Healy, M.S. Singer, B.E. Peterson, J.S. Mirsky, L. Wright, J.E. Smith, P.M. Nadkarni, & P.L. Miller. Senselab: a project in multidisciplinary, multilevel sensory integration. pp. 21-56 of Neuroinformatics: An Overview of the Human Brain Project, ed. S.H. Koslow & M.F. Huerta. Lawrence Erlbaum Associates, Inc. Mahwah, NJ: 1997.

Prakash Nadkarni., Jason Mirsky, Emmanouil Skoufos, Matthew Healy, Michael Hines, Perry Miller and Gordon Shepherd. Modeling Heterogeneous Data on the Nervous System (Book Chapter) in Bioinformatics Databases ed. Stanley I. Letovsky. Kluwer Academic Publishers, Dordrecht , Netherlands. pp. 38-51


Prakash Nadkarni:. Metadata-driven Software Systems in Biomedicine. Springer, 2011.

Prakash Nadkarni: Parallel Programming with Linda: An Advanced Introduction.

Linda, conceived by David Gelernter and initially implemented by Nick Carriero, both of Yale University, is a "mini-language" constructing of just 4 constructs that is embedded in a conventional language (e.g., C or FORTRAN) to give it parallel capabilities. While conceptually very simple, its reliance on a pre-processor (that must be hand-sculpted for the language in which it is to be embedded) has limited its widespread use, by contrast with MPI, which though somewhat more difficult to use, depends purely on a subroutine library. This book was written back in 1992, and also gives an introduction to parallel programming. It can be downloaded by clicking here.

Downloadable Publications

The downloadable zip files linked to below typically contain more than one publication. Refer to the numbers in the list above. Please note: figures are generally bundled with the MS-word file, or separately, but some figures may be missing). Many of the publications have originally appeared in the Journal of the American Medical Informatics Association, from where you can get excellent content related to Medical Informatics. JAMIA publications that are more than three years old are also freely downloadable (as PDFs) from NCBI's PubMed Central Site

Click on the following links to download full text of publications of interest :
publications 1, 3 and 4;
publications 7 and 8
publication 9
publications 10, 12 and 13
publication 14
publications 16 and 17 .
publication 18
publication 19
publication 20
publication 21
publication 22
publication 23
publication 24
publication 25
publication 26