Update: OHDSI COVID-19 study-a-thon.

Thought a summary in the read below section might be helpful as the main page might be a lot to digest.

The OHDSI Covid 19 group re-convenes at 6:00 (EST I think) Monday for updates.

For those who want to do modelling, you cannot get the data but must write analysis scripts that data holders will run on their computers and return results. My guess is that might be most doable through here where custom R scripts can be implemented that data holders might be able to run. Maybe some RStan experts can try to work this through.

Main Message: The Observational Health Data Sciences and Informatics (or OHDSI, pronounced “Odyssey”) group undertook a large 300 + researcher co-operative COVID-19 study-a-thon March 27 -30. Preliminary results of 113 finalized cohort analysis packages are becoming available. Given OHDSI capabilities through large scale co-operation, multi-country data sets (including South Korea), considerably  improved and calibrated methods for causal inference in observational data – these are likely the most informative and credible sources of evidence available today. Additionally, these will be continuously updated as more data becomes available. Link https://www.ohdsi.org/covid-19-updates/

Summary Considerations: 1. Analyses of multi-country data sources converted to common format and continuously cross-validated and updated. 2. An international collection of clinical researchers that have experiences and resources to act co-operatively in a critical but supportive atmosphere. 3. A  suite of advanced data analysis and causal inference software and expertise supported by ongoing methods experts and research. Unique and fine-tuned methods of calibration based assessments based on negative controls that provide credible assessments of performance such confidence interval coverage. With this, study claims become more accurately evaluable and actionable.

Background: OHDSI is a multi-stakeholder, interdisciplinary collaborative that is striving to bring out the value of observational health data through large-scale analytics. The OHDSI Research Network spans over 600 million patients. OHDSI use a Common Data Model for observational healthcare data, where multiple data sources become available after being converted to a common format to make them usable widely. Code is run on these by the data holder and only the results shared. This avoids the need for most REB approvals. Research results can then be drawn from many disparate data sources and compared and contrasted to understand the effect of potential data quality issues and other biases. By assessing and analyzing multiple data sources concurrently there is higher statistical power as well as opportunities to spot flaws and rectify them.

Answering causal questions leads to better predictive and prescriptive modeling. It is also the key to the correct identification of unknown effects as well as the latent factors that influence outcomes, and to produce hypotheses, validation, and proof. Large-scale observational data offers a new window for verifying our existing causal understandings and for inferring new causal relationships at a fast pace.

Existing health care data such as insurance claims and electronic health records are used to determine what the effects, both good and bad, of proposed medical treatments might be. One challenge that must be overcome is that people who get a treatment may differ from those that do not, and if we do not adjust for that appropriately we may draw incorrect conclusions.

ODHSI have created a benchmark to measure the performance of various methods for dealing with this and related issues. They use control questions, i.e., questions where we know the answer, and evaluate whether the different methods produce the expected results. Running this benchmark on multiple large health care databases covering millions of lives, one can observe that most methods currently used in academic publications based on single data sources are simply not reliable. For example, more often than not, the known true answer lies outside the confidence interval, despite the fact that such confidence intervals are typically designed to include that true answer 95% of the time.

ODHSI results have confirmed the concerns about using health care data to determine the effect of treatments. These performance characteristics can then be taken into account when interpreting the results. ODHSI calls this empirical calibration. When using this calibration, one can show that some methods tend to work rather well across the different scenarios.

Reference:

How Confident Are We About Observational Findings in Health Care: A Benchmark Study

by Martijn J. Schuemie, M. Soledad Cepeda, Marc A. Suchard, Jianxiao Yang, Yuxi Tian Alejandro Schuler, Patrick B. Ryan, David Madigan, and George Hripcsak. https://hdsr.mitpress.mit.edu/pub/fxz7kr65 Harvard Data Science Review (HDSR)