Cargando…

Mining Gene Expression Data of Multiple Sclerosis

OBJECTIVES: Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Pi, Zhang, Qin, Zhu, Zhenli, Huang, Zhengliang, Li, Ke
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4059716/
https://www.ncbi.nlm.nih.gov/pubmed/24932510
http://dx.doi.org/10.1371/journal.pone.0100052
_version_ 1782321274668711936
author Guo, Pi
Zhang, Qin
Zhu, Zhenli
Huang, Zhengliang
Li, Ke
author_facet Guo, Pi
Zhang, Qin
Zhu, Zhenli
Huang, Zhengliang
Li, Ke
author_sort Guo, Pi
collection PubMed
description OBJECTIVES: Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed an analysis to identify disease-related genes using multiple sclerosis as an example. MATERIALS AND METHODS: Gene expression profiles based on the transcriptome of peripheral blood mononuclear cells from a total of 44 samples from 26 multiple sclerosis patients and 18 individuals with other neurological diseases (control) were analyzed. Feature selection algorithms including Support Vector Machine based on Recursive Feature Elimination, Receiver Operating Characteristic Curve, and Boruta algorithms were jointly performed to select candidate genes associating with multiple sclerosis. Multiple classification models categorized samples into two different groups based on the identified genes. Models’ performance was evaluated using cross-validation methods, and an optimal classifier for gene selection was determined. RESULTS: An overlapping feature set was identified consisting of 8 genes that were differentially expressed between the two phenotype groups. The genes were significantly associated with the pathways of apoptosis and cytokine-cytokine receptor interaction. TNFSF10 was significantly associated with multiple sclerosis. A Support Vector Machine model was established based on the featured genes and gave a practical accuracy of ∼86%. This binary classification model also outperformed the other models in terms of Sensitivity, Specificity and F1 score. CONCLUSIONS: The combined analytical framework integrating feature ranking algorithms and Support Vector Machine model could be used for selecting genes for other diseases.
format Online
Article
Text
id pubmed-4059716
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-40597162014-06-19 Mining Gene Expression Data of Multiple Sclerosis Guo, Pi Zhang, Qin Zhu, Zhenli Huang, Zhengliang Li, Ke PLoS One Research Article OBJECTIVES: Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed an analysis to identify disease-related genes using multiple sclerosis as an example. MATERIALS AND METHODS: Gene expression profiles based on the transcriptome of peripheral blood mononuclear cells from a total of 44 samples from 26 multiple sclerosis patients and 18 individuals with other neurological diseases (control) were analyzed. Feature selection algorithms including Support Vector Machine based on Recursive Feature Elimination, Receiver Operating Characteristic Curve, and Boruta algorithms were jointly performed to select candidate genes associating with multiple sclerosis. Multiple classification models categorized samples into two different groups based on the identified genes. Models’ performance was evaluated using cross-validation methods, and an optimal classifier for gene selection was determined. RESULTS: An overlapping feature set was identified consisting of 8 genes that were differentially expressed between the two phenotype groups. The genes were significantly associated with the pathways of apoptosis and cytokine-cytokine receptor interaction. TNFSF10 was significantly associated with multiple sclerosis. A Support Vector Machine model was established based on the featured genes and gave a practical accuracy of ∼86%. This binary classification model also outperformed the other models in terms of Sensitivity, Specificity and F1 score. CONCLUSIONS: The combined analytical framework integrating feature ranking algorithms and Support Vector Machine model could be used for selecting genes for other diseases. Public Library of Science 2014-06-16 /pmc/articles/PMC4059716/ /pubmed/24932510 http://dx.doi.org/10.1371/journal.pone.0100052 Text en © 2014 Guo et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Guo, Pi
Zhang, Qin
Zhu, Zhenli
Huang, Zhengliang
Li, Ke
Mining Gene Expression Data of Multiple Sclerosis
title Mining Gene Expression Data of Multiple Sclerosis
title_full Mining Gene Expression Data of Multiple Sclerosis
title_fullStr Mining Gene Expression Data of Multiple Sclerosis
title_full_unstemmed Mining Gene Expression Data of Multiple Sclerosis
title_short Mining Gene Expression Data of Multiple Sclerosis
title_sort mining gene expression data of multiple sclerosis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4059716/
https://www.ncbi.nlm.nih.gov/pubmed/24932510
http://dx.doi.org/10.1371/journal.pone.0100052
work_keys_str_mv AT guopi mininggeneexpressiondataofmultiplesclerosis
AT zhangqin mininggeneexpressiondataofmultiplesclerosis
AT zhuzhenli mininggeneexpressiondataofmultiplesclerosis
AT huangzhengliang mininggeneexpressiondataofmultiplesclerosis
AT like mininggeneexpressiondataofmultiplesclerosis