Cargando…

Cross-study validation for the assessment of prediction algorithms

Motivation: Numerous competing algorithms for prediction in high-dimensional settings have been developed in the statistical and machine-learning literature. Learning algorithms and the prediction models they generate are typically evaluated on the basis of cross-validation error estimates in a few...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bernau, Christoph, Riester, Markus, Boulesteix, Anne-Laure, Parmigiani, Giovanni, Huttenhower, Curtis, Waldron, Levi, Trippa, Lorenzo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2014
Materias:	Ismb 2014 Proceedings Papers Committee
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058929/ https://www.ncbi.nlm.nih.gov/pubmed/24931973 http://dx.doi.org/10.1093/bioinformatics/btu279

_version_	1782321188292263936
author	Bernau, Christoph Riester, Markus Boulesteix, Anne-Laure Parmigiani, Giovanni Huttenhower, Curtis Waldron, Levi Trippa, Lorenzo
author_facet	Bernau, Christoph Riester, Markus Boulesteix, Anne-Laure Parmigiani, Giovanni Huttenhower, Curtis Waldron, Levi Trippa, Lorenzo
author_sort	Bernau, Christoph
collection	PubMed
description	Motivation: Numerous competing algorithms for prediction in high-dimensional settings have been developed in the statistical and machine-learning literature. Learning algorithms and the prediction models they generate are typically evaluated on the basis of cross-validation error estimates in a few exemplary datasets. However, in most applications, the ultimate goal of prediction modeling is to provide accurate predictions for independent samples obtained in different settings. Cross-validation within exemplary datasets may not adequately reflect performance in the broader application context. Methods: We develop and implement a systematic approach to ‘cross-study validation’, to replace or supplement conventional cross-validation when evaluating high-dimensional prediction models in independent datasets. We illustrate it via simulations and in a collection of eight estrogen-receptor positive breast cancer microarray gene-expression datasets, where the objective is predicting distant metastasis-free survival (DMFS). We computed the C-index for all pairwise combinations of training and validation datasets. We evaluate several alternatives for summarizing the pairwise validation statistics, and compare these to conventional cross-validation. Results: Our data-driven simulations and our application to survival prediction with eight breast cancer microarray datasets, suggest that standard cross-validation produces inflated discrimination accuracy for all algorithms considered, when compared to cross-study validation. Furthermore, the ranking of learning algorithms differs, suggesting that algorithms performing best in cross-validation may be suboptimal when evaluated through independent validation. Availability: The survHD: Survival in High Dimensions package (http://www.bitbucket.org/lwaldron/survhd) will be made available through Bioconductor. Contact: levi.waldron@hunter.cuny.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-4058929
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-40589292014-06-18 Cross-study validation for the assessment of prediction algorithms Bernau, Christoph Riester, Markus Boulesteix, Anne-Laure Parmigiani, Giovanni Huttenhower, Curtis Waldron, Levi Trippa, Lorenzo Bioinformatics Ismb 2014 Proceedings Papers Committee Motivation: Numerous competing algorithms for prediction in high-dimensional settings have been developed in the statistical and machine-learning literature. Learning algorithms and the prediction models they generate are typically evaluated on the basis of cross-validation error estimates in a few exemplary datasets. However, in most applications, the ultimate goal of prediction modeling is to provide accurate predictions for independent samples obtained in different settings. Cross-validation within exemplary datasets may not adequately reflect performance in the broader application context. Methods: We develop and implement a systematic approach to ‘cross-study validation’, to replace or supplement conventional cross-validation when evaluating high-dimensional prediction models in independent datasets. We illustrate it via simulations and in a collection of eight estrogen-receptor positive breast cancer microarray gene-expression datasets, where the objective is predicting distant metastasis-free survival (DMFS). We computed the C-index for all pairwise combinations of training and validation datasets. We evaluate several alternatives for summarizing the pairwise validation statistics, and compare these to conventional cross-validation. Results: Our data-driven simulations and our application to survival prediction with eight breast cancer microarray datasets, suggest that standard cross-validation produces inflated discrimination accuracy for all algorithms considered, when compared to cross-study validation. Furthermore, the ranking of learning algorithms differs, suggesting that algorithms performing best in cross-validation may be suboptimal when evaluated through independent validation. Availability: The survHD: Survival in High Dimensions package (http://www.bitbucket.org/lwaldron/survhd) will be made available through Bioconductor. Contact: levi.waldron@hunter.cuny.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2014-06-15 2014-06-11 /pmc/articles/PMC4058929/ /pubmed/24931973 http://dx.doi.org/10.1093/bioinformatics/btu279 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Ismb 2014 Proceedings Papers Committee Bernau, Christoph Riester, Markus Boulesteix, Anne-Laure Parmigiani, Giovanni Huttenhower, Curtis Waldron, Levi Trippa, Lorenzo Cross-study validation for the assessment of prediction algorithms
title	Cross-study validation for the assessment of prediction algorithms
title_full	Cross-study validation for the assessment of prediction algorithms
title_fullStr	Cross-study validation for the assessment of prediction algorithms
title_full_unstemmed	Cross-study validation for the assessment of prediction algorithms
title_short	Cross-study validation for the assessment of prediction algorithms
title_sort	cross-study validation for the assessment of prediction algorithms
topic	Ismb 2014 Proceedings Papers Committee
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058929/ https://www.ncbi.nlm.nih.gov/pubmed/24931973 http://dx.doi.org/10.1093/bioinformatics/btu279
work_keys_str_mv	AT bernauchristoph crossstudyvalidationfortheassessmentofpredictionalgorithms AT riestermarkus crossstudyvalidationfortheassessmentofpredictionalgorithms AT boulesteixannelaure crossstudyvalidationfortheassessmentofpredictionalgorithms AT parmigianigiovanni crossstudyvalidationfortheassessmentofpredictionalgorithms AT huttenhowercurtis crossstudyvalidationfortheassessmentofpredictionalgorithms AT waldronlevi crossstudyvalidationfortheassessmentofpredictionalgorithms AT trippalorenzo crossstudyvalidationfortheassessmentofpredictionalgorithms

Cross-study validation for the assessment of prediction algorithms

Ejemplares similares