Cargando…

Classification of microarrays; synergistic effects between normalization, gene selection and machine learning

BACKGROUND: Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate...

Descripción completa

Detalles Bibliográficos
Autores principales: Önskog, Jenny, Freyhult, Eva, Landfors, Mattias, Rydén, Patrik, Hvidsten, Torgeir R
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3229535/
https://www.ncbi.nlm.nih.gov/pubmed/21982277
http://dx.doi.org/10.1186/1471-2105-12-390
_version_ 1782217958600212480
author Önskog, Jenny
Freyhult, Eva
Landfors, Mattias
Rydén, Patrik
Hvidsten, Torgeir R
author_facet Önskog, Jenny
Freyhult, Eva
Landfors, Mattias
Rydén, Patrik
Hvidsten, Torgeir R
author_sort Önskog, Jenny
collection PubMed
description BACKGROUND: Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning. RESULTS: In this study, we used seven previously published cancer-related microarray data sets to compare the effects on classification performance of five normalization methods, three gene selection methods with 21 different numbers of selected genes and eight machine learning methods. Performance in term of error rate was rigorously estimated by repeatedly employing a double cross validation approach. Since performance varies greatly between data sets, we devised an analysis method that first compares methods within individual data sets and then visualizes the comparisons across data sets. We discovered both well performing individual methods and synergies between different methods. CONCLUSION: Support Vector Machines with a radial basis kernel, linear kernel or polynomial kernel of degree 2 all performed consistently well across data sets. We show that there is a synergistic relationship between these methods and gene selection based on the T-test and the selection of a relatively high number of genes. Also, we find that these methods benefit significantly from using normalized data, although it is hard to draw general conclusions about the relative performance of different normalization procedures.
format Online
Article
Text
id pubmed-3229535
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32295352011-12-03 Classification of microarrays; synergistic effects between normalization, gene selection and machine learning Önskog, Jenny Freyhult, Eva Landfors, Mattias Rydén, Patrik Hvidsten, Torgeir R BMC Bioinformatics Research Article BACKGROUND: Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning. RESULTS: In this study, we used seven previously published cancer-related microarray data sets to compare the effects on classification performance of five normalization methods, three gene selection methods with 21 different numbers of selected genes and eight machine learning methods. Performance in term of error rate was rigorously estimated by repeatedly employing a double cross validation approach. Since performance varies greatly between data sets, we devised an analysis method that first compares methods within individual data sets and then visualizes the comparisons across data sets. We discovered both well performing individual methods and synergies between different methods. CONCLUSION: Support Vector Machines with a radial basis kernel, linear kernel or polynomial kernel of degree 2 all performed consistently well across data sets. We show that there is a synergistic relationship between these methods and gene selection based on the T-test and the selection of a relatively high number of genes. Also, we find that these methods benefit significantly from using normalized data, although it is hard to draw general conclusions about the relative performance of different normalization procedures. BioMed Central 2011-10-07 /pmc/articles/PMC3229535/ /pubmed/21982277 http://dx.doi.org/10.1186/1471-2105-12-390 Text en Copyright ©2011 Önskog et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Önskog, Jenny
Freyhult, Eva
Landfors, Mattias
Rydén, Patrik
Hvidsten, Torgeir R
Classification of microarrays; synergistic effects between normalization, gene selection and machine learning
title Classification of microarrays; synergistic effects between normalization, gene selection and machine learning
title_full Classification of microarrays; synergistic effects between normalization, gene selection and machine learning
title_fullStr Classification of microarrays; synergistic effects between normalization, gene selection and machine learning
title_full_unstemmed Classification of microarrays; synergistic effects between normalization, gene selection and machine learning
title_short Classification of microarrays; synergistic effects between normalization, gene selection and machine learning
title_sort classification of microarrays; synergistic effects between normalization, gene selection and machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3229535/
https://www.ncbi.nlm.nih.gov/pubmed/21982277
http://dx.doi.org/10.1186/1471-2105-12-390
work_keys_str_mv AT onskogjenny classificationofmicroarrayssynergisticeffectsbetweennormalizationgeneselectionandmachinelearning
AT freyhulteva classificationofmicroarrayssynergisticeffectsbetweennormalizationgeneselectionandmachinelearning
AT landforsmattias classificationofmicroarrayssynergisticeffectsbetweennormalizationgeneselectionandmachinelearning
AT rydenpatrik classificationofmicroarrayssynergisticeffectsbetweennormalizationgeneselectionandmachinelearning
AT hvidstentorgeirr classificationofmicroarrayssynergisticeffectsbetweennormalizationgeneselectionandmachinelearning