Cargando…

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification

BACKGROUND: Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microar...

Descripción completa

Detalles Bibliográficos
Autores principales:	Statnikov, Alexander, Wang, Lily, Aliferis, Constantin F
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2492881/ https://www.ncbi.nlm.nih.gov/pubmed/18647401 http://dx.doi.org/10.1186/1471-2105-9-319

_version_	1782158206174232576
author	Statnikov, Alexander Wang, Lily Aliferis, Constantin F
author_facet	Statnikov, Alexander Wang, Lily Aliferis, Constantin F
author_sort	Statnikov, Alexander
collection	PubMed
description	BACKGROUND: Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain. RESULTS: In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms. CONCLUSION: We found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.
format	Text
id	pubmed-2492881
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-24928812008-08-01 A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification Statnikov, Alexander Wang, Lily Aliferis, Constantin F BMC Bioinformatics Research Article BACKGROUND: Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain. RESULTS: In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms. CONCLUSION: We found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used. BioMed Central 2008-07-22 /pmc/articles/PMC2492881/ /pubmed/18647401 http://dx.doi.org/10.1186/1471-2105-9-319 Text en Copyright © 2008 Statnikov et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Statnikov, Alexander Wang, Lily Aliferis, Constantin F A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
title	A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
title_full	A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
title_fullStr	A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
title_full_unstemmed	A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
title_short	A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
title_sort	comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2492881/ https://www.ncbi.nlm.nih.gov/pubmed/18647401 http://dx.doi.org/10.1186/1471-2105-9-319
work_keys_str_mv	AT statnikovalexander acomprehensivecomparisonofrandomforestsandsupportvectormachinesformicroarraybasedcancerclassification AT wanglily acomprehensivecomparisonofrandomforestsandsupportvectormachinesformicroarraybasedcancerclassification AT aliferisconstantinf acomprehensivecomparisonofrandomforestsandsupportvectormachinesformicroarraybasedcancerclassification AT statnikovalexander comprehensivecomparisonofrandomforestsandsupportvectormachinesformicroarraybasedcancerclassification AT wanglily comprehensivecomparisonofrandomforestsandsupportvectormachinesformicroarraybasedcancerclassification AT aliferisconstantinf comprehensivecomparisonofrandomforestsandsupportvectormachinesformicroarraybasedcancerclassification

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification

Ejemplares similares