Cargando…
Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data
BACKGROUND: Extracting relevant information from microarray data is a very complex task due to the characteristics of the data sets, as they comprise a large number of features while few samples are generally available. In this sense, feature selection is a very important aspect of the analysis help...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4108856/ https://www.ncbi.nlm.nih.gov/pubmed/25077572 http://dx.doi.org/10.1186/1742-4682-11-S1-S7 |
_version_ | 1782327797700624384 |
---|---|
author | Luque-Baena, Rafael Marcos Urda, Daniel Subirats, Jose Luis Franco, Leonardo Jerez, Jose M |
author_facet | Luque-Baena, Rafael Marcos Urda, Daniel Subirats, Jose Luis Franco, Leonardo Jerez, Jose M |
author_sort | Luque-Baena, Rafael Marcos |
collection | PubMed |
description | BACKGROUND: Extracting relevant information from microarray data is a very complex task due to the characteristics of the data sets, as they comprise a large number of features while few samples are generally available. In this sense, feature selection is a very important aspect of the analysis helping in the tasks of identifying relevant genes and also for maximizing predictive information. METHODS: Due to its simplicity and speed, Stepwise Forward Selection (SFS) is a widely used feature selection technique. In this work, we carry a comparative study of SFS and Genetic Algorithms (GA) as general frameworks for the analysis of microarray data with the aim of identifying group of genes with high predictive capability and biological relevance. Six standard and machine learning-based techniques (Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Naive Bayes (NB), C-MANTEC Constructive Neural Network, K-Nearest Neighbors (kNN) and Multilayer perceptron (MLP)) are used within both frameworks using six free-public datasets for the task of predicting cancer outcome. RESULTS: Better cancer outcome prediction results were obtained using the GA framework noting that this approach, in comparison to the SFS one, leads to a larger selection set, uses a large number of comparison between genetic profiles and thus it is computationally more intensive. Also the GA framework permitted to obtain a set of genes that can be considered to be more biologically relevant. Regarding the different classifiers used standard feedforward neural networks (MLP), LDA and SVM lead to similar and best results, while C-MANTEC and k-NN followed closely but with a lower accuracy. Further, C-MANTEC, MLP and LDA permitted to obtain a more limited set of genes in comparison to SVM, NB and kNN, and in particular C-MANTEC resulted in the most robust classifier in terms of changes in the parameter settings. CONCLUSIONS: This study shows that if prediction accuracy is the objective, the GA-based approach lead to better results respect to the SFS approach, independently of the classifier used. Regarding classifiers, even if C-MANTEC did not achieve the best overall results, the performance was competitive with a very robust behaviour in terms of the parameters of the algorithm, and thus it can be considered as a candidate technique for future studies. |
format | Online Article Text |
id | pubmed-4108856 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-41088562014-08-04 Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data Luque-Baena, Rafael Marcos Urda, Daniel Subirats, Jose Luis Franco, Leonardo Jerez, Jose M Theor Biol Med Model Research BACKGROUND: Extracting relevant information from microarray data is a very complex task due to the characteristics of the data sets, as they comprise a large number of features while few samples are generally available. In this sense, feature selection is a very important aspect of the analysis helping in the tasks of identifying relevant genes and also for maximizing predictive information. METHODS: Due to its simplicity and speed, Stepwise Forward Selection (SFS) is a widely used feature selection technique. In this work, we carry a comparative study of SFS and Genetic Algorithms (GA) as general frameworks for the analysis of microarray data with the aim of identifying group of genes with high predictive capability and biological relevance. Six standard and machine learning-based techniques (Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Naive Bayes (NB), C-MANTEC Constructive Neural Network, K-Nearest Neighbors (kNN) and Multilayer perceptron (MLP)) are used within both frameworks using six free-public datasets for the task of predicting cancer outcome. RESULTS: Better cancer outcome prediction results were obtained using the GA framework noting that this approach, in comparison to the SFS one, leads to a larger selection set, uses a large number of comparison between genetic profiles and thus it is computationally more intensive. Also the GA framework permitted to obtain a set of genes that can be considered to be more biologically relevant. Regarding the different classifiers used standard feedforward neural networks (MLP), LDA and SVM lead to similar and best results, while C-MANTEC and k-NN followed closely but with a lower accuracy. Further, C-MANTEC, MLP and LDA permitted to obtain a more limited set of genes in comparison to SVM, NB and kNN, and in particular C-MANTEC resulted in the most robust classifier in terms of changes in the parameter settings. CONCLUSIONS: This study shows that if prediction accuracy is the objective, the GA-based approach lead to better results respect to the SFS approach, independently of the classifier used. Regarding classifiers, even if C-MANTEC did not achieve the best overall results, the performance was competitive with a very robust behaviour in terms of the parameters of the algorithm, and thus it can be considered as a candidate technique for future studies. BioMed Central 2014-05-07 /pmc/articles/PMC4108856/ /pubmed/25077572 http://dx.doi.org/10.1186/1742-4682-11-S1-S7 Text en Copyright © 2014 Luque-Baena et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Luque-Baena, Rafael Marcos Urda, Daniel Subirats, Jose Luis Franco, Leonardo Jerez, Jose M Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data |
title | Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data |
title_full | Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data |
title_fullStr | Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data |
title_full_unstemmed | Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data |
title_short | Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data |
title_sort | application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4108856/ https://www.ncbi.nlm.nih.gov/pubmed/25077572 http://dx.doi.org/10.1186/1742-4682-11-S1-S7 |
work_keys_str_mv | AT luquebaenarafaelmarcos applicationofgeneticalgorithmsandconstructiveneuralnetworksfortheanalysisofmicroarraycancerdata AT urdadaniel applicationofgeneticalgorithmsandconstructiveneuralnetworksfortheanalysisofmicroarraycancerdata AT subiratsjoseluis applicationofgeneticalgorithmsandconstructiveneuralnetworksfortheanalysisofmicroarraycancerdata AT francoleonardo applicationofgeneticalgorithmsandconstructiveneuralnetworksfortheanalysisofmicroarraycancerdata AT jerezjosem applicationofgeneticalgorithmsandconstructiveneuralnetworksfortheanalysisofmicroarraycancerdata |