The Human Studyome Project

Human studies are a critical path to translating biomedical discoveries into clinical interventions, and to understanding how existing clinical interventions can be optimized to improve health outcomes. However, because of their complexity, heterogeneity, and sheer volume, patterns and relationships among human studies are difficult to detect and understand. Clinicians and researchers are hard-pressed to accurately find, interpret, synthesize, and apply study results to clinical practice or to the design of new studies. The result is an inefficient transfer of human studies knowledge, and a waste of precious resources.

To bring the power of computers to enable large-scale data mining, synthesis, re-analysis, and reuse of human studies, the “human studyome” – the totality of human studies worldwide – should be standardized and made computable. Current reuse and analysis of human studies data from different sources is prohibitively difficult largely because of the lack of a reference human studies ontology to serve as the common semantics for querying across databases of design data (e.g., IRB databases, clinical trial management systems) and results data (e.g., analytic systems). Existing information models and data standards like BRIDG, CDISC SDTM, and are primarily operational or administrative in focus, and are insufficient for supporting the full range of scientific analysis needs.

We developed the Ontology of Clinical Research (OCRe) to model the design and analytic features of human studies for scientific query and analysis. Our broad long-term goal is to capture all human studies design and results information into OCRe-standardized form to enable computational methods for large-scale query, synthesis, analysis, and visualization of diverse human studies. Our Human Studies Database Project (HSDB) aimed to use OCRe ontology to federate data sharing from individual CTSA institutions over the caGrid query architecture, initially of human studies design data, then of individual participant-level data. We developed a specific project focusing on the representation and acquisition of eligibility criteria in the Eligibility Rule Grammar and Ontology (ERGO) structured computable form. Another project, the ExaCT project, used natural language processing methods to extract  descriptions of a trial's interventions, population, outcome measures, funding sources, and other critical characteristics from study reports and protocol documents. Work on visualization of clinical trials demonstrated the value of computable RCT information for large-scale visualization and sense-making of heterogenous studies. 

All our prior work is now finding realization in Vivlia global clinical trial data sharing platform implementing purpose-driven data sharing to enhance scientific discovery & public trust. Designed to reduce barriers to clinical trials data sharing, Vivli is establishing an independent data repository, cloud-based analytics platform, and in-depth search engine through which data from clinical trials conducted by researchers in academic, industry, foundation, and non-profit entities can be hosted, shared and accessed. (See also the NEJM Perspective article on Vivli

Principal Investigator Ida Sim, MD, PhD

Ontologies and Data Sharing

Data Acquisition

Trial Visualization


Ontology of Clinical Research


Eligibility Rule Grammar and Ontology


Human Studies Database 

RCT Schema

the data model for trial bank software

Task Analysis of Systematic Reviewing

design approach for trial banks

Vivli (current work)

A global clinical trial data sharing platform

ExaCT (prior work)

NLP of key design characteristics from free text articles

Trial Bank Publishing (prior work)

  • Bank-a-Trial submit a trial


use tag clouds to search


prototype to compare a set of MTCT of HIV trials

Network analysis of clinical trials on depression

research presented at the AMIA Fall Symposium 2010

| Contact Us |