Cargando…
Mining Gene Expression Data of Multiple Sclerosis
OBJECTIVES: Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4059716/ https://www.ncbi.nlm.nih.gov/pubmed/24932510 http://dx.doi.org/10.1371/journal.pone.0100052 |
_version_ | 1782321274668711936 |
---|---|
author | Guo, Pi Zhang, Qin Zhu, Zhenli Huang, Zhengliang Li, Ke |
author_facet | Guo, Pi Zhang, Qin Zhu, Zhenli Huang, Zhengliang Li, Ke |
author_sort | Guo, Pi |
collection | PubMed |
description | OBJECTIVES: Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed an analysis to identify disease-related genes using multiple sclerosis as an example. MATERIALS AND METHODS: Gene expression profiles based on the transcriptome of peripheral blood mononuclear cells from a total of 44 samples from 26 multiple sclerosis patients and 18 individuals with other neurological diseases (control) were analyzed. Feature selection algorithms including Support Vector Machine based on Recursive Feature Elimination, Receiver Operating Characteristic Curve, and Boruta algorithms were jointly performed to select candidate genes associating with multiple sclerosis. Multiple classification models categorized samples into two different groups based on the identified genes. Models’ performance was evaluated using cross-validation methods, and an optimal classifier for gene selection was determined. RESULTS: An overlapping feature set was identified consisting of 8 genes that were differentially expressed between the two phenotype groups. The genes were significantly associated with the pathways of apoptosis and cytokine-cytokine receptor interaction. TNFSF10 was significantly associated with multiple sclerosis. A Support Vector Machine model was established based on the featured genes and gave a practical accuracy of ∼86%. This binary classification model also outperformed the other models in terms of Sensitivity, Specificity and F1 score. CONCLUSIONS: The combined analytical framework integrating feature ranking algorithms and Support Vector Machine model could be used for selecting genes for other diseases. |
format | Online Article Text |
id | pubmed-4059716 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-40597162014-06-19 Mining Gene Expression Data of Multiple Sclerosis Guo, Pi Zhang, Qin Zhu, Zhenli Huang, Zhengliang Li, Ke PLoS One Research Article OBJECTIVES: Microarray produces a large amount of gene expression data, containing various biological implications. The challenge is to detect a panel of discriminative genes associated with disease. This study proposed a robust classification model for gene selection using gene expression data, and performed an analysis to identify disease-related genes using multiple sclerosis as an example. MATERIALS AND METHODS: Gene expression profiles based on the transcriptome of peripheral blood mononuclear cells from a total of 44 samples from 26 multiple sclerosis patients and 18 individuals with other neurological diseases (control) were analyzed. Feature selection algorithms including Support Vector Machine based on Recursive Feature Elimination, Receiver Operating Characteristic Curve, and Boruta algorithms were jointly performed to select candidate genes associating with multiple sclerosis. Multiple classification models categorized samples into two different groups based on the identified genes. Models’ performance was evaluated using cross-validation methods, and an optimal classifier for gene selection was determined. RESULTS: An overlapping feature set was identified consisting of 8 genes that were differentially expressed between the two phenotype groups. The genes were significantly associated with the pathways of apoptosis and cytokine-cytokine receptor interaction. TNFSF10 was significantly associated with multiple sclerosis. A Support Vector Machine model was established based on the featured genes and gave a practical accuracy of ∼86%. This binary classification model also outperformed the other models in terms of Sensitivity, Specificity and F1 score. CONCLUSIONS: The combined analytical framework integrating feature ranking algorithms and Support Vector Machine model could be used for selecting genes for other diseases. Public Library of Science 2014-06-16 /pmc/articles/PMC4059716/ /pubmed/24932510 http://dx.doi.org/10.1371/journal.pone.0100052 Text en © 2014 Guo et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Guo, Pi Zhang, Qin Zhu, Zhenli Huang, Zhengliang Li, Ke Mining Gene Expression Data of Multiple Sclerosis |
title | Mining Gene Expression Data of Multiple Sclerosis |
title_full | Mining Gene Expression Data of Multiple Sclerosis |
title_fullStr | Mining Gene Expression Data of Multiple Sclerosis |
title_full_unstemmed | Mining Gene Expression Data of Multiple Sclerosis |
title_short | Mining Gene Expression Data of Multiple Sclerosis |
title_sort | mining gene expression data of multiple sclerosis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4059716/ https://www.ncbi.nlm.nih.gov/pubmed/24932510 http://dx.doi.org/10.1371/journal.pone.0100052 |
work_keys_str_mv | AT guopi mininggeneexpressiondataofmultiplesclerosis AT zhangqin mininggeneexpressiondataofmultiplesclerosis AT zhuzhenli mininggeneexpressiondataofmultiplesclerosis AT huangzhengliang mininggeneexpressiondataofmultiplesclerosis AT like mininggeneexpressiondataofmultiplesclerosis |