Cargando…

Selecting a single model or combining multiple models for microarray-based classifier development? – A comparative analysis based on large and diverse datasets generated from the MAQC-II project

ABSTRACT: BACKGROUND: Genomic biomarkers play an increasing role in both preclinical and clinical application. Development of genomic biomarkers with microarrays is an area of intensive investigation. However, despite sustained and continuing effort, developing microarray-based predictive models (i....

Descripción completa

Detalles Bibliográficos
Autores principales:	Chen, Minjun, Shi, Leming, Kelly, Reagan, Perkins, Roger, Fang, Hong, Tong, Weida
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3236846/ https://www.ncbi.nlm.nih.gov/pubmed/22166133 http://dx.doi.org/10.1186/1471-2105-12-S10-S3

_version_	1782218795213914112
author	Chen, Minjun Shi, Leming Kelly, Reagan Perkins, Roger Fang, Hong Tong, Weida
author_facet	Chen, Minjun Shi, Leming Kelly, Reagan Perkins, Roger Fang, Hong Tong, Weida
author_sort	Chen, Minjun
collection	PubMed
description	ABSTRACT: BACKGROUND: Genomic biomarkers play an increasing role in both preclinical and clinical application. Development of genomic biomarkers with microarrays is an area of intensive investigation. However, despite sustained and continuing effort, developing microarray-based predictive models (i.e., genomics biomarkers) capable of reliable prediction for an observed or measured outcome (i.e., endpoint) of unknown samples in preclinical and clinical practice remains a considerable challenge. No straightforward guidelines exist for selecting a single model that will perform best when presented with unknown samples. In the second phase of the MicroArray Quality Control (MAQC-II) project, 36 analysis teams produced a large number of models for 13 preclinical and clinical endpoints. Before external validation was performed, each team nominated one model per endpoint (referred to here as 'nominated models') from which MAQC-II experts selected 13 'candidate models' to represent the best model for each endpoint. Both the nominated and candidate models from MAQC-II provide benchmarks to assess other methodologies for developing microarray-based predictive models. METHODS: We developed a simple ensemble method by taking a number of the top performing models from cross-validation and developing an ensemble model for each of the MAQC-II endpoints. We compared the ensemble models with both nominated and candidate models from MAQC-II using blinded external validation. RESULTS: For 10 of the 13 MAQC-II endpoints originally analyzed by the MAQC-II data analysis team from the National Center for Toxicological Research (NCTR), the ensemble models achieved equal or better predictive performance than the NCTR nominated models. Additionally, the ensemble models had performance comparable to the MAQC-II candidate models. Most ensemble models also had better performance than the nominated models generated by five other MAQC-II data analysis teams that analyzed all 13 endpoints. CONCLUSIONS: Our findings suggest that an ensemble method can often attain a higher average predictive performance in an external validation set than a corresponding “optimized” model method. Using an ensemble method to determine a final model is a potentially important supplement to the good modeling practices recommended by the MAQC-II project for developing microarray-based genomic biomarkers.
format	Online Article Text
id	pubmed-3236846
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-32368462011-12-14 Selecting a single model or combining multiple models for microarray-based classifier development? – A comparative analysis based on large and diverse datasets generated from the MAQC-II project Chen, Minjun Shi, Leming Kelly, Reagan Perkins, Roger Fang, Hong Tong, Weida BMC Bioinformatics Proceedings ABSTRACT: BACKGROUND: Genomic biomarkers play an increasing role in both preclinical and clinical application. Development of genomic biomarkers with microarrays is an area of intensive investigation. However, despite sustained and continuing effort, developing microarray-based predictive models (i.e., genomics biomarkers) capable of reliable prediction for an observed or measured outcome (i.e., endpoint) of unknown samples in preclinical and clinical practice remains a considerable challenge. No straightforward guidelines exist for selecting a single model that will perform best when presented with unknown samples. In the second phase of the MicroArray Quality Control (MAQC-II) project, 36 analysis teams produced a large number of models for 13 preclinical and clinical endpoints. Before external validation was performed, each team nominated one model per endpoint (referred to here as 'nominated models') from which MAQC-II experts selected 13 'candidate models' to represent the best model for each endpoint. Both the nominated and candidate models from MAQC-II provide benchmarks to assess other methodologies for developing microarray-based predictive models. METHODS: We developed a simple ensemble method by taking a number of the top performing models from cross-validation and developing an ensemble model for each of the MAQC-II endpoints. We compared the ensemble models with both nominated and candidate models from MAQC-II using blinded external validation. RESULTS: For 10 of the 13 MAQC-II endpoints originally analyzed by the MAQC-II data analysis team from the National Center for Toxicological Research (NCTR), the ensemble models achieved equal or better predictive performance than the NCTR nominated models. Additionally, the ensemble models had performance comparable to the MAQC-II candidate models. Most ensemble models also had better performance than the nominated models generated by five other MAQC-II data analysis teams that analyzed all 13 endpoints. CONCLUSIONS: Our findings suggest that an ensemble method can often attain a higher average predictive performance in an external validation set than a corresponding “optimized” model method. Using an ensemble method to determine a final model is a potentially important supplement to the good modeling practices recommended by the MAQC-II project for developing microarray-based genomic biomarkers. BioMed Central 2011-10-18 /pmc/articles/PMC3236846/ /pubmed/22166133 http://dx.doi.org/10.1186/1471-2105-12-S10-S3 Text en Copyright ©2011 Chen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Chen, Minjun Shi, Leming Kelly, Reagan Perkins, Roger Fang, Hong Tong, Weida Selecting a single model or combining multiple models for microarray-based classifier development? – A comparative analysis based on large and diverse datasets generated from the MAQC-II project
title	Selecting a single model or combining multiple models for microarray-based classifier development? – A comparative analysis based on large and diverse datasets generated from the MAQC-II project
title_full	Selecting a single model or combining multiple models for microarray-based classifier development? – A comparative analysis based on large and diverse datasets generated from the MAQC-II project
title_fullStr	Selecting a single model or combining multiple models for microarray-based classifier development? – A comparative analysis based on large and diverse datasets generated from the MAQC-II project
title_full_unstemmed	Selecting a single model or combining multiple models for microarray-based classifier development? – A comparative analysis based on large and diverse datasets generated from the MAQC-II project
title_short	Selecting a single model or combining multiple models for microarray-based classifier development? – A comparative analysis based on large and diverse datasets generated from the MAQC-II project
title_sort	selecting a single model or combining multiple models for microarray-based classifier development? – a comparative analysis based on large and diverse datasets generated from the maqc-ii project
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3236846/ https://www.ncbi.nlm.nih.gov/pubmed/22166133 http://dx.doi.org/10.1186/1471-2105-12-S10-S3
work_keys_str_mv	AT chenminjun selectingasinglemodelorcombiningmultiplemodelsformicroarraybasedclassifierdevelopmentacomparativeanalysisbasedonlargeanddiversedatasetsgeneratedfromthemaqciiproject AT shileming selectingasinglemodelorcombiningmultiplemodelsformicroarraybasedclassifierdevelopmentacomparativeanalysisbasedonlargeanddiversedatasetsgeneratedfromthemaqciiproject AT kellyreagan selectingasinglemodelorcombiningmultiplemodelsformicroarraybasedclassifierdevelopmentacomparativeanalysisbasedonlargeanddiversedatasetsgeneratedfromthemaqciiproject AT perkinsroger selectingasinglemodelorcombiningmultiplemodelsformicroarraybasedclassifierdevelopmentacomparativeanalysisbasedonlargeanddiversedatasetsgeneratedfromthemaqciiproject AT fanghong selectingasinglemodelorcombiningmultiplemodelsformicroarraybasedclassifierdevelopmentacomparativeanalysisbasedonlargeanddiversedatasetsgeneratedfromthemaqciiproject AT tongweida selectingasinglemodelorcombiningmultiplemodelsformicroarraybasedclassifierdevelopmentacomparativeanalysisbasedonlargeanddiversedatasetsgeneratedfromthemaqciiproject

Selecting a single model or combining multiple models for microarray-based classifier development? – A comparative analysis based on large and diverse datasets generated from the MAQC-II project

Ejemplares similares