Cargando…

Mining Gene Expression Data of Multiple Sclerosis

OBJECTIVES: Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and...

Descripción completa

Detalles Bibliográficos
Autores principales:	Guo, Pi, Zhang, Qin, Zhu, Zhenli, Huang, Zhengliang, Li, Ke
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4059716/ https://www.ncbi.nlm.nih.gov/pubmed/24932510 http://dx.doi.org/10.1371/journal.pone.0100052

_version_	1782321274668711936
author	Guo, Pi Zhang, Qin Zhu, Zhenli Huang, Zhengliang Li, Ke
author_facet	Guo, Pi Zhang, Qin Zhu, Zhenli Huang, Zhengliang Li, Ke
author_sort	Guo, Pi
collection	PubMed
description	OBJECTIVES: Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed an analysis to identify disease-related genes using multiple sclerosis as an example. MATERIALS AND METHODS: Gene expression profiles based on the transcriptome of peripheral blood mononuclear cells from a total of 44 samples from 26 multiple sclerosis patients and 18 individuals with other neurological diseases (control) were analyzed. Feature selection algorithms including Support Vector Machine based on Recursive Feature Elimination, Receiver Operating Characteristic Curve, and Boruta algorithms were jointly performed to select candidate genes associating with multiple sclerosis. Multiple classification models categorized samples into two different groups based on the identified genes. Models’ performance was evaluated using cross-validation methods, and an optimal classifier for gene selection was determined. RESULTS: An overlapping feature set was identified consisting of 8 genes that were differentially expressed between the two phenotype groups. The genes were significantly associated with the pathways of apoptosis and cytokine-cytokine receptor interaction. TNFSF10 was significantly associated with multiple sclerosis. A Support Vector Machine model was established based on the featured genes and gave a practical accuracy of ∼86%. This binary classification model also outperformed the other models in terms of Sensitivity, Specificity and F1 score. CONCLUSIONS: The combined analytical framework integrating feature ranking algorithms and Support Vector Machine model could be used for selecting genes for other diseases.
format	Online Article Text
id	pubmed-4059716
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-40597162014-06-19 Mining Gene Expression Data of Multiple Sclerosis Guo, Pi Zhang, Qin Zhu, Zhenli Huang, Zhengliang Li, Ke PLoS One Research Article OBJECTIVES: Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed an analysis to identify disease-related genes using multiple sclerosis as an example. MATERIALS AND METHODS: Gene expression profiles based on the transcriptome of peripheral blood mononuclear cells from a total of 44 samples from 26 multiple sclerosis patients and 18 individuals with other neurological diseases (control) were analyzed. Feature selection algorithms including Support Vector Machine based on Recursive Feature Elimination, Receiver Operating Characteristic Curve, and Boruta algorithms were jointly performed to select candidate genes associating with multiple sclerosis. Multiple classification models categorized samples into two different groups based on the identified genes. Models’ performance was evaluated using cross-validation methods, and an optimal classifier for gene selection was determined. RESULTS: An overlapping feature set was identified consisting of 8 genes that were differentially expressed between the two phenotype groups. The genes were significantly associated with the pathways of apoptosis and cytokine-cytokine receptor interaction. TNFSF10 was significantly associated with multiple sclerosis. A Support Vector Machine model was established based on the featured genes and gave a practical accuracy of ∼86%. This binary classification model also outperformed the other models in terms of Sensitivity, Specificity and F1 score. CONCLUSIONS: The combined analytical framework integrating feature ranking algorithms and Support Vector Machine model could be used for selecting genes for other diseases. Public Library of Science 2014-06-16 /pmc/articles/PMC4059716/ /pubmed/24932510 http://dx.doi.org/10.1371/journal.pone.0100052 Text en © 2014 Guo et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Guo, Pi Zhang, Qin Zhu, Zhenli Huang, Zhengliang Li, Ke Mining Gene Expression Data of Multiple Sclerosis
title	Mining Gene Expression Data of Multiple Sclerosis
title_full	Mining Gene Expression Data of Multiple Sclerosis
title_fullStr	Mining Gene Expression Data of Multiple Sclerosis
title_full_unstemmed	Mining Gene Expression Data of Multiple Sclerosis
title_short	Mining Gene Expression Data of Multiple Sclerosis
title_sort	mining gene expression data of multiple sclerosis
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4059716/ https://www.ncbi.nlm.nih.gov/pubmed/24932510 http://dx.doi.org/10.1371/journal.pone.0100052
work_keys_str_mv	AT guopi mininggeneexpressiondataofmultiplesclerosis AT zhangqin mininggeneexpressiondataofmultiplesclerosis AT zhuzhenli mininggeneexpressiondataofmultiplesclerosis AT huangzhengliang mininggeneexpressiondataofmultiplesclerosis AT like mininggeneexpressiondataofmultiplesclerosis

Mining Gene Expression Data of Multiple Sclerosis

Ejemplares similares