Cargando…

Measuring the Effect of Inter-Study Variability on Estimating Prediction Error

BACKGROUND: The biomarker discovery field is replete with molecular signatures that have not translated into the clinic despite ostensibly promising performance in predicting disease phenotypes. One widely cited reason is lack of classification consistency, largely due to failure to maintain perform...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Shuyi, Sung, Jaeyun, Magis, Andrew T., Wang, Yuliang, Geman, Donald, Price, Nathan D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4201588/
https://www.ncbi.nlm.nih.gov/pubmed/25330348
http://dx.doi.org/10.1371/journal.pone.0110840
_version_ 1782340200699002880
author Ma, Shuyi
Sung, Jaeyun
Magis, Andrew T.
Wang, Yuliang
Geman, Donald
Price, Nathan D.
author_facet Ma, Shuyi
Sung, Jaeyun
Magis, Andrew T.
Wang, Yuliang
Geman, Donald
Price, Nathan D.
author_sort Ma, Shuyi
collection PubMed
description BACKGROUND: The biomarker discovery field is replete with molecular signatures that have not translated into the clinic despite ostensibly promising performance in predicting disease phenotypes. One widely cited reason is lack of classification consistency, largely due to failure to maintain performance from study to study. This failure is widely attributed to variability in data collected for the same phenotype among disparate studies, due to technical factors unrelated to phenotypes (e.g., laboratory settings resulting in “batch-effects”) and non-phenotype-associated biological variation in the underlying populations. These sources of variability persist in new data collection technologies. METHODS: Here we quantify the impact of these combined “study-effects” on a disease signature’s predictive performance by comparing two types of validation methods: ordinary randomized cross-validation (RCV), which extracts random subsets of samples for testing, and inter-study validation (ISV), which excludes an entire study for testing. Whereas RCV hardwires an assumption of training and testing on identically distributed data, this key property is lost in ISV, yielding systematic decreases in performance estimates relative to RCV. Measuring the RCV-ISV difference as a function of number of studies quantifies influence of study-effects on performance. RESULTS: As a case study, we gathered publicly available gene expression data from 1,470 microarray samples of 6 lung phenotypes from 26 independent experimental studies and 769 RNA-seq samples of 2 lung phenotypes from 4 independent studies. We find that the RCV-ISV performance discrepancy is greater in phenotypes with few studies, and that the ISV performance converges toward RCV performance as data from additional studies are incorporated into classification. CONCLUSIONS: We show that by examining how fast ISV performance approaches RCV as the number of studies is increased, one can estimate when “sufficient” diversity has been achieved for learning a molecular signature likely to translate without significant loss of accuracy to new clinical settings.
format Online
Article
Text
id pubmed-4201588
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-42015882014-10-21 Measuring the Effect of Inter-Study Variability on Estimating Prediction Error Ma, Shuyi Sung, Jaeyun Magis, Andrew T. Wang, Yuliang Geman, Donald Price, Nathan D. PLoS One Research Article BACKGROUND: The biomarker discovery field is replete with molecular signatures that have not translated into the clinic despite ostensibly promising performance in predicting disease phenotypes. One widely cited reason is lack of classification consistency, largely due to failure to maintain performance from study to study. This failure is widely attributed to variability in data collected for the same phenotype among disparate studies, due to technical factors unrelated to phenotypes (e.g., laboratory settings resulting in “batch-effects”) and non-phenotype-associated biological variation in the underlying populations. These sources of variability persist in new data collection technologies. METHODS: Here we quantify the impact of these combined “study-effects” on a disease signature’s predictive performance by comparing two types of validation methods: ordinary randomized cross-validation (RCV), which extracts random subsets of samples for testing, and inter-study validation (ISV), which excludes an entire study for testing. Whereas RCV hardwires an assumption of training and testing on identically distributed data, this key property is lost in ISV, yielding systematic decreases in performance estimates relative to RCV. Measuring the RCV-ISV difference as a function of number of studies quantifies influence of study-effects on performance. RESULTS: As a case study, we gathered publicly available gene expression data from 1,470 microarray samples of 6 lung phenotypes from 26 independent experimental studies and 769 RNA-seq samples of 2 lung phenotypes from 4 independent studies. We find that the RCV-ISV performance discrepancy is greater in phenotypes with few studies, and that the ISV performance converges toward RCV performance as data from additional studies are incorporated into classification. CONCLUSIONS: We show that by examining how fast ISV performance approaches RCV as the number of studies is increased, one can estimate when “sufficient” diversity has been achieved for learning a molecular signature likely to translate without significant loss of accuracy to new clinical settings. Public Library of Science 2014-10-17 /pmc/articles/PMC4201588/ /pubmed/25330348 http://dx.doi.org/10.1371/journal.pone.0110840 Text en © 2014 Ma et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Ma, Shuyi
Sung, Jaeyun
Magis, Andrew T.
Wang, Yuliang
Geman, Donald
Price, Nathan D.
Measuring the Effect of Inter-Study Variability on Estimating Prediction Error
title Measuring the Effect of Inter-Study Variability on Estimating Prediction Error
title_full Measuring the Effect of Inter-Study Variability on Estimating Prediction Error
title_fullStr Measuring the Effect of Inter-Study Variability on Estimating Prediction Error
title_full_unstemmed Measuring the Effect of Inter-Study Variability on Estimating Prediction Error
title_short Measuring the Effect of Inter-Study Variability on Estimating Prediction Error
title_sort measuring the effect of inter-study variability on estimating prediction error
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4201588/
https://www.ncbi.nlm.nih.gov/pubmed/25330348
http://dx.doi.org/10.1371/journal.pone.0110840
work_keys_str_mv AT mashuyi measuringtheeffectofinterstudyvariabilityonestimatingpredictionerror
AT sungjaeyun measuringtheeffectofinterstudyvariabilityonestimatingpredictionerror
AT magisandrewt measuringtheeffectofinterstudyvariabilityonestimatingpredictionerror
AT wangyuliang measuringtheeffectofinterstudyvariabilityonestimatingpredictionerror
AT gemandonald measuringtheeffectofinterstudyvariabilityonestimatingpredictionerror
AT pricenathand measuringtheeffectofinterstudyvariabilityonestimatingpredictionerror