Cargando…

Classification and biomarker identification using gene network modules and support vector machines

BACKGROUND: Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data a...

Descripción completa

Detalles Bibliográficos
Autores principales: Yousef, Malik, Ketany, Mohamed, Manevitz, Larry, Showe, Louise C, Showe, Michael K
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2774324/
https://www.ncbi.nlm.nih.gov/pubmed/19832995
http://dx.doi.org/10.1186/1471-2105-10-337
_version_ 1782173929876488192
author Yousef, Malik
Ketany, Mohamed
Manevitz, Larry
Showe, Louise C
Showe, Michael K
author_facet Yousef, Malik
Ketany, Mohamed
Manevitz, Larry
Showe, Louise C
Showe, Michael K
author_sort Yousef, Malik
collection PubMed
description BACKGROUND: Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes. We now demonstrate that an algorithm which integrates network information with recursive feature elimination based on SVM exhibits good performance and improves the biological interpretability of the results. We refer to the method as SVM with Recursive Network Elimination (SVM-RNE) RESULTS: Initially, one thousand genes selected by t-test from a training set are filtered so that only genes that map to a gene network database remain. The Gene Expression Network Analysis Tool (GXNA) is applied to the remaining genes to form n clusters of genes that are highly connected in the network. Linear SVM is used to classify the samples using these clusters, and a weight is assigned to each cluster based on its importance to the classification. The least informative clusters are removed while retaining the remainder for the next classification step. This process is repeated until an optimal classification is obtained. CONCLUSION: More than 90% accuracy can be obtained in classification of selected microarray datasets by integrating the interaction network information with the gene expression information from the microarrays. The Matlab version of SVM-RNE can be downloaded from
format Text
id pubmed-2774324
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27743242009-11-07 Classification and biomarker identification using gene network modules and support vector machines Yousef, Malik Ketany, Mohamed Manevitz, Larry Showe, Louise C Showe, Michael K BMC Bioinformatics Methodology Article BACKGROUND: Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes. We now demonstrate that an algorithm which integrates network information with recursive feature elimination based on SVM exhibits good performance and improves the biological interpretability of the results. We refer to the method as SVM with Recursive Network Elimination (SVM-RNE) RESULTS: Initially, one thousand genes selected by t-test from a training set are filtered so that only genes that map to a gene network database remain. The Gene Expression Network Analysis Tool (GXNA) is applied to the remaining genes to form n clusters of genes that are highly connected in the network. Linear SVM is used to classify the samples using these clusters, and a weight is assigned to each cluster based on its importance to the classification. The least informative clusters are removed while retaining the remainder for the next classification step. This process is repeated until an optimal classification is obtained. CONCLUSION: More than 90% accuracy can be obtained in classification of selected microarray datasets by integrating the interaction network information with the gene expression information from the microarrays. The Matlab version of SVM-RNE can be downloaded from BioMed Central 2009-10-15 /pmc/articles/PMC2774324/ /pubmed/19832995 http://dx.doi.org/10.1186/1471-2105-10-337 Text en Copyright © 2009 Yousef et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Yousef, Malik
Ketany, Mohamed
Manevitz, Larry
Showe, Louise C
Showe, Michael K
Classification and biomarker identification using gene network modules and support vector machines
title Classification and biomarker identification using gene network modules and support vector machines
title_full Classification and biomarker identification using gene network modules and support vector machines
title_fullStr Classification and biomarker identification using gene network modules and support vector machines
title_full_unstemmed Classification and biomarker identification using gene network modules and support vector machines
title_short Classification and biomarker identification using gene network modules and support vector machines
title_sort classification and biomarker identification using gene network modules and support vector machines
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2774324/
https://www.ncbi.nlm.nih.gov/pubmed/19832995
http://dx.doi.org/10.1186/1471-2105-10-337
work_keys_str_mv AT yousefmalik classificationandbiomarkeridentificationusinggenenetworkmodulesandsupportvectormachines
AT ketanymohamed classificationandbiomarkeridentificationusinggenenetworkmodulesandsupportvectormachines
AT manevitzlarry classificationandbiomarkeridentificationusinggenenetworkmodulesandsupportvectormachines
AT showelouisec classificationandbiomarkeridentificationusinggenenetworkmodulesandsupportvectormachines
AT showemichaelk classificationandbiomarkeridentificationusinggenenetworkmodulesandsupportvectormachines