Cargando…

Collaborative Automation Reliably Remediating Erroneous Conclusion Threats (CARRECT)

OBJECTIVE: The objective of the CARRECT software is to make cutting edge statistical methods for reducing bias in epidemiological studies easy to use and useful for both novice and expert users. INTRODUCTION: Analyses produced by epidemiologists and public health practitioners are susceptible to bia...

Descripción completa

Detalles Bibliográficos
Autores principales: Lansey, Jonathan C., Picciano, Paul, Yohai, Ian, Grant, Fred, Gern, Robert
Formato: Online Artículo Texto
Lenguaje:English
Publicado: University of Illinois at Chicago Library 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692841/
_version_ 1782274668458147840
author Lansey, Jonathan C.
Picciano, Paul
Yohai, Ian
Grant, Fred
Gern, Robert
author_facet Lansey, Jonathan C.
Picciano, Paul
Yohai, Ian
Grant, Fred
Gern, Robert
author_sort Lansey, Jonathan C.
collection PubMed
description OBJECTIVE: The objective of the CARRECT software is to make cutting edge statistical methods for reducing bias in epidemiological studies easy to use and useful for both novice and expert users. INTRODUCTION: Analyses produced by epidemiologists and public health practitioners are susceptible to bias from a number of sources including missing data, confounding variables, and statistical model selection. It often requires a great deal of expertise to understand and apply the multitude of tests, corrections, and selection rules, and these tasks can be time-consuming and burdensome. To address this challenge, Aptima began development of CARRECT, the Collaborative Automation Reliably Remediating Erroneous Conclusion Threats system. When complete, CARRECT will provide an expert system that can be embedded in an analyst’s workflow. CARRECT will support statistical bias reduction and improved analyses and decision making by engaging the user in a collaborative process in which the technology is transparent to the analyst. METHODS: Older approaches to imputing missing data, including mean imputation and single imputation regression methods, have steadily given way to a class of methods known as “multiple imputation” (hereafter “MI”; Rubin 1987). Rather than making the restrictive assumption that the data are missing completely at random (MCAR), MI typically assumes the data are missing at random (MAR). There are two key innovations behind MI. First, the observed values can be useful in predicting the missing cells, and thus specifying a joint distribution of the data is the first step in implementing the models. Second, single imputation methods will likely fail not only because of the inherent uncertainty in the missing values but also because of the estimation uncertainty associated with generating the parameters in the imputation procedure itself. By contrast, drawing the missing values multiple times, thereby generating m complete datasets along with the estimated parameters of the model properly accounts for both types of uncertainty (Rubin 1987; King et al. 2001). As a result, MI will lead to valid standard errors and confidence intervals along with unbiased point estimates. In order to compute the joint distribution, CARRECT uses a bootstrapping-based algorithm that gives essentially the same answers as the standard Bayesian Markov Chain Monte Carlo (MCMC) or Expectation Maximization (EM) approaches, is usually considerably faster than existing approaches and can handle many more variables. RESULTS: Tests were conducted on one of the proposed methods with an epidemiological dataset from the Integrated Health Interview Series (IHIS) producing verifiably unbiased results despite high missingness rates. In addition, mockups (Figure 1) were created of an intuitive data wizard that guides the user through the analysis processes by analyzing key features of a given dataset. The mockups also show prompts for the user to provide additional substantive knowledge to improve the handling of imperfect datasets, as well as the selection of the most appropriate algorithms and models. CONCLUSIONS: Our approach and program were designed to make bias mitigation much more accessible to much more than only the statistical elite. We hope that it will have a wide impact on reducing bias in epidemiological studies and provide more accurate information to policymakers.
format Online
Article
Text
id pubmed-3692841
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher University of Illinois at Chicago Library
record_format MEDLINE/PubMed
spelling pubmed-36928412013-06-26 Collaborative Automation Reliably Remediating Erroneous Conclusion Threats (CARRECT) Lansey, Jonathan C. Picciano, Paul Yohai, Ian Grant, Fred Gern, Robert Online J Public Health Inform ISDS 2012 Conference Abstracts OBJECTIVE: The objective of the CARRECT software is to make cutting edge statistical methods for reducing bias in epidemiological studies easy to use and useful for both novice and expert users. INTRODUCTION: Analyses produced by epidemiologists and public health practitioners are susceptible to bias from a number of sources including missing data, confounding variables, and statistical model selection. It often requires a great deal of expertise to understand and apply the multitude of tests, corrections, and selection rules, and these tasks can be time-consuming and burdensome. To address this challenge, Aptima began development of CARRECT, the Collaborative Automation Reliably Remediating Erroneous Conclusion Threats system. When complete, CARRECT will provide an expert system that can be embedded in an analyst’s workflow. CARRECT will support statistical bias reduction and improved analyses and decision making by engaging the user in a collaborative process in which the technology is transparent to the analyst. METHODS: Older approaches to imputing missing data, including mean imputation and single imputation regression methods, have steadily given way to a class of methods known as “multiple imputation” (hereafter “MI”; Rubin 1987). Rather than making the restrictive assumption that the data are missing completely at random (MCAR), MI typically assumes the data are missing at random (MAR). There are two key innovations behind MI. First, the observed values can be useful in predicting the missing cells, and thus specifying a joint distribution of the data is the first step in implementing the models. Second, single imputation methods will likely fail not only because of the inherent uncertainty in the missing values but also because of the estimation uncertainty associated with generating the parameters in the imputation procedure itself. By contrast, drawing the missing values multiple times, thereby generating m complete datasets along with the estimated parameters of the model properly accounts for both types of uncertainty (Rubin 1987; King et al. 2001). As a result, MI will lead to valid standard errors and confidence intervals along with unbiased point estimates. In order to compute the joint distribution, CARRECT uses a bootstrapping-based algorithm that gives essentially the same answers as the standard Bayesian Markov Chain Monte Carlo (MCMC) or Expectation Maximization (EM) approaches, is usually considerably faster than existing approaches and can handle many more variables. RESULTS: Tests were conducted on one of the proposed methods with an epidemiological dataset from the Integrated Health Interview Series (IHIS) producing verifiably unbiased results despite high missingness rates. In addition, mockups (Figure 1) were created of an intuitive data wizard that guides the user through the analysis processes by analyzing key features of a given dataset. The mockups also show prompts for the user to provide additional substantive knowledge to improve the handling of imperfect datasets, as well as the selection of the most appropriate algorithms and models. CONCLUSIONS: Our approach and program were designed to make bias mitigation much more accessible to much more than only the statistical elite. We hope that it will have a wide impact on reducing bias in epidemiological studies and provide more accurate information to policymakers. University of Illinois at Chicago Library 2013-04-04 /pmc/articles/PMC3692841/ Text en ©2013 the author(s) http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/ojphi/about/submissions#copyrightNotice This is an Open Access article. Authors own copyright of their articles appearing in the Online Journal of Public Health Informatics. Readers may copy articles without permission of the copyright owner(s), as long as the author and OJPHI are acknowledged in the copy and the copy is used for educational, not-for-profit purposes.
spellingShingle ISDS 2012 Conference Abstracts
Lansey, Jonathan C.
Picciano, Paul
Yohai, Ian
Grant, Fred
Gern, Robert
Collaborative Automation Reliably Remediating Erroneous Conclusion Threats (CARRECT)
title Collaborative Automation Reliably Remediating Erroneous Conclusion Threats (CARRECT)
title_full Collaborative Automation Reliably Remediating Erroneous Conclusion Threats (CARRECT)
title_fullStr Collaborative Automation Reliably Remediating Erroneous Conclusion Threats (CARRECT)
title_full_unstemmed Collaborative Automation Reliably Remediating Erroneous Conclusion Threats (CARRECT)
title_short Collaborative Automation Reliably Remediating Erroneous Conclusion Threats (CARRECT)
title_sort collaborative automation reliably remediating erroneous conclusion threats (carrect)
topic ISDS 2012 Conference Abstracts
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3692841/
work_keys_str_mv AT lanseyjonathanc collaborativeautomationreliablyremediatingerroneousconclusionthreatscarrect
AT piccianopaul collaborativeautomationreliablyremediatingerroneousconclusionthreatscarrect
AT yohaiian collaborativeautomationreliablyremediatingerroneousconclusionthreatscarrect
AT grantfred collaborativeautomationreliablyremediatingerroneousconclusionthreatscarrect
AT gernrobert collaborativeautomationreliablyremediatingerroneousconclusionthreatscarrect