Cargando…
Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes
Due to the complexity of the pathological mechanisms of neurodegenerative diseases, traditional differentially-expressed gene selection methods cannot detect disease-associated genes accurately. Recent studies have shown that consensus-guided unsupervised feature selection (CGUFS) performs well in f...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6071299/ https://www.ncbi.nlm.nih.gov/pubmed/30002337 http://dx.doi.org/10.3390/genes9070350 |
_version_ | 1783343851320115200 |
---|---|
author | Guo, Xia Jiang, Xue Xu, Jing Quan, Xiongwen Wu, Min Zhang, Han |
author_facet | Guo, Xia Jiang, Xue Xu, Jing Quan, Xiongwen Wu, Min Zhang, Han |
author_sort | Guo, Xia |
collection | PubMed |
description | Due to the complexity of the pathological mechanisms of neurodegenerative diseases, traditional differentially-expressed gene selection methods cannot detect disease-associated genes accurately. Recent studies have shown that consensus-guided unsupervised feature selection (CGUFS) performs well in feature selection for identifying disease-associated genes. Since the random initialization of the feature selection matrix in CGUFS results in instability of the final disease-associated gene set, for the purposes of this study we proposed an ensemble method based on CGUFS—namely, ensemble consensus-guided unsupervised feature selection (ECGUFS) in order to further improve the accuracy of disease-associated genes and the stability of feature gene sets. We also proposed a bagging integration strategy to integrate the results of CGUFS. Lastly, we conducted experiments with Huntington’s disease RNA sequencing (RNA-Seq) data and obtained the final feature gene set, where we detected 287 disease-associated genes. Enrichment analysis on these genes has shown that postsynaptic density and the postsynaptic membrane, synapse, and cell junction are all affected during the disease’s progression. However, ECGUFS greatly improved the accuracy of disease-associated gene prediction and the stability of the disease-associated gene set. We conducted a classification of samples with labels based on the linear support vector machine with 10-fold cross-validation. The average accuracy is 0.9, which suggests the effectiveness of the feature gene set. |
format | Online Article Text |
id | pubmed-6071299 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-60712992018-08-09 Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes Guo, Xia Jiang, Xue Xu, Jing Quan, Xiongwen Wu, Min Zhang, Han Genes (Basel) Article Due to the complexity of the pathological mechanisms of neurodegenerative diseases, traditional differentially-expressed gene selection methods cannot detect disease-associated genes accurately. Recent studies have shown that consensus-guided unsupervised feature selection (CGUFS) performs well in feature selection for identifying disease-associated genes. Since the random initialization of the feature selection matrix in CGUFS results in instability of the final disease-associated gene set, for the purposes of this study we proposed an ensemble method based on CGUFS—namely, ensemble consensus-guided unsupervised feature selection (ECGUFS) in order to further improve the accuracy of disease-associated genes and the stability of feature gene sets. We also proposed a bagging integration strategy to integrate the results of CGUFS. Lastly, we conducted experiments with Huntington’s disease RNA sequencing (RNA-Seq) data and obtained the final feature gene set, where we detected 287 disease-associated genes. Enrichment analysis on these genes has shown that postsynaptic density and the postsynaptic membrane, synapse, and cell junction are all affected during the disease’s progression. However, ECGUFS greatly improved the accuracy of disease-associated gene prediction and the stability of the disease-associated gene set. We conducted a classification of samples with labels based on the linear support vector machine with 10-fold cross-validation. The average accuracy is 0.9, which suggests the effectiveness of the feature gene set. MDPI 2018-07-12 /pmc/articles/PMC6071299/ /pubmed/30002337 http://dx.doi.org/10.3390/genes9070350 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Guo, Xia Jiang, Xue Xu, Jing Quan, Xiongwen Wu, Min Zhang, Han Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes |
title | Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes |
title_full | Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes |
title_fullStr | Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes |
title_full_unstemmed | Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes |
title_short | Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes |
title_sort | ensemble consensus-guided unsupervised feature selection to identify huntington’s disease-associated genes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6071299/ https://www.ncbi.nlm.nih.gov/pubmed/30002337 http://dx.doi.org/10.3390/genes9070350 |
work_keys_str_mv | AT guoxia ensembleconsensusguidedunsupervisedfeatureselectiontoidentifyhuntingtonsdiseaseassociatedgenes AT jiangxue ensembleconsensusguidedunsupervisedfeatureselectiontoidentifyhuntingtonsdiseaseassociatedgenes AT xujing ensembleconsensusguidedunsupervisedfeatureselectiontoidentifyhuntingtonsdiseaseassociatedgenes AT quanxiongwen ensembleconsensusguidedunsupervisedfeatureselectiontoidentifyhuntingtonsdiseaseassociatedgenes AT wumin ensembleconsensusguidedunsupervisedfeatureselectiontoidentifyhuntingtonsdiseaseassociatedgenes AT zhanghan ensembleconsensusguidedunsupervisedfeatureselectiontoidentifyhuntingtonsdiseaseassociatedgenes |