The Human Studyome Project

Human studies are a critical path to translating biomedical discoveries into clinical interventions, and to understanding how existing clinical interventions can be optimized to improve health outcomes. However, because of their complexity, heterogeneity, and sheer volume, patterns and relationships among human studies are difficult to detect and understand. Clinicians and researchers are hard-pressed to accurately find, interpret, synthesize, and apply study results to clinical practice or to the design of new studies. The result is an inefficient transfer of human studies knowledge, and a waste of precious resources.

To bring the power of computers to enable large-scale data mining, synthesis, re-analysis, and reuse of human studies, the “human studyome” – the totality of human studies worldwide – should be standardized and made computable. Current reuse and analysis of human studies data from different sources is prohibitively difficult largely because of the lack of a reference human studies ontology to serve as the common semantics for querying across databases of design data (e.g., IRB databases, clinical trial management systems) and results data (e.g., analytic systems). Existing information models and data standards like BRIDG, CDISC SDTM, and are primarily operational or administrative in focus, and are insufficient for supporting the full range of scientific analysis needs.

We are therefore developing the Ontology of Clinical Research (OCRe) to model the design and analytic features of human studies for scientific query and analysis. Our broad long-term goal is to capture all human studies design and results information into OCRe-standardized form to enable computational methods for large-scale query, synthesis, analysis, and visualization of diverse human studies. Our Human Studies Database Project (HSDB) aims to use OCRe ontology to federate data sharing from individual CTSA institutions over the caGrid query architecture, initially of human studies design data, then of individual participant-level data. We have a specific project focusing on the representation and acquisition of eligibility criteria in the Eligibility Rule Grammar and Ontology (ERGO) structured computable form. Another project, the ExaCT project, uses natural language processing methods to extract  descriptions of a trial's interventions, population, outcome measures, funding sources, and other critical characteristics from study reports and protocol documents. Ongoing work on visualization of clinical trials aims to demonstrate the value of computable RCT information for large-scale visualization and sense-making of heterogenous studies. 

Principal Investigator Ida Sim, MD, PhD

Ontologies and Data Sharing

Data Acquisition

Trial Visualization


Ontology of Clinical Research


Eligibility Rule Grammar and Ontology


Human Studies Database 

RCT Schema

the data model for trial bank software

Task Analysis of Systematic Reviewing

design approach for trial banks


NLP of key design characteristics from free text articles


[upcoming work]

Trial Bank Publishing (prior work)

  • Bank-a-Trial submit a trial


use tag clouds to search


prototype to compare a set of MTCT of HIV trials

Network analysis of clinical trials on depression

research presented at the AMIA Fall Symposium 2010

| Contact Us |

© 2002-2017 The Regents of the University of California. Last modified 12-Jun-17