Cargando…

Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes

Due to the complexity of the pathological mechanisms of neurodegenerative diseases, traditional differentially-expressed gene selection methods cannot detect disease-associated genes accurately. Recent studies have shown that consensus-guided unsupervised feature selection (CGUFS) performs well in f...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Xia, Jiang, Xue, Xu, Jing, Quan, Xiongwen, Wu, Min, Zhang, Han
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6071299/
https://www.ncbi.nlm.nih.gov/pubmed/30002337
http://dx.doi.org/10.3390/genes9070350
_version_ 1783343851320115200
author Guo, Xia
Jiang, Xue
Xu, Jing
Quan, Xiongwen
Wu, Min
Zhang, Han
author_facet Guo, Xia
Jiang, Xue
Xu, Jing
Quan, Xiongwen
Wu, Min
Zhang, Han
author_sort Guo, Xia
collection PubMed
description Due to the complexity of the pathological mechanisms of neurodegenerative diseases, traditional differentially-expressed gene selection methods cannot detect disease-associated genes accurately. Recent studies have shown that consensus-guided unsupervised feature selection (CGUFS) performs well in feature selection for identifying disease-associated genes. Since the random initialization of the feature selection matrix in CGUFS results in instability of the final disease-associated gene set, for the purposes of this study we proposed an ensemble method based on CGUFS—namely, ensemble consensus-guided unsupervised feature selection (ECGUFS) in order to further improve the accuracy of disease-associated genes and the stability of feature gene sets. We also proposed a bagging integration strategy to integrate the results of CGUFS. Lastly, we conducted experiments with Huntington’s disease RNA sequencing (RNA-Seq) data and obtained the final feature gene set, where we detected 287 disease-associated genes. Enrichment analysis on these genes has shown that postsynaptic density and the postsynaptic membrane, synapse, and cell junction are all affected during the disease’s progression. However, ECGUFS greatly improved the accuracy of disease-associated gene prediction and the stability of the disease-associated gene set. We conducted a classification of samples with labels based on the linear support vector machine with 10-fold cross-validation. The average accuracy is 0.9, which suggests the effectiveness of the feature gene set.
format Online
Article
Text
id pubmed-6071299
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-60712992018-08-09 Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes Guo, Xia Jiang, Xue Xu, Jing Quan, Xiongwen Wu, Min Zhang, Han Genes (Basel) Article Due to the complexity of the pathological mechanisms of neurodegenerative diseases, traditional differentially-expressed gene selection methods cannot detect disease-associated genes accurately. Recent studies have shown that consensus-guided unsupervised feature selection (CGUFS) performs well in feature selection for identifying disease-associated genes. Since the random initialization of the feature selection matrix in CGUFS results in instability of the final disease-associated gene set, for the purposes of this study we proposed an ensemble method based on CGUFS—namely, ensemble consensus-guided unsupervised feature selection (ECGUFS) in order to further improve the accuracy of disease-associated genes and the stability of feature gene sets. We also proposed a bagging integration strategy to integrate the results of CGUFS. Lastly, we conducted experiments with Huntington’s disease RNA sequencing (RNA-Seq) data and obtained the final feature gene set, where we detected 287 disease-associated genes. Enrichment analysis on these genes has shown that postsynaptic density and the postsynaptic membrane, synapse, and cell junction are all affected during the disease’s progression. However, ECGUFS greatly improved the accuracy of disease-associated gene prediction and the stability of the disease-associated gene set. We conducted a classification of samples with labels based on the linear support vector machine with 10-fold cross-validation. The average accuracy is 0.9, which suggests the effectiveness of the feature gene set. MDPI 2018-07-12 /pmc/articles/PMC6071299/ /pubmed/30002337 http://dx.doi.org/10.3390/genes9070350 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Guo, Xia
Jiang, Xue
Xu, Jing
Quan, Xiongwen
Wu, Min
Zhang, Han
Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes
title Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes
title_full Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes
title_fullStr Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes
title_full_unstemmed Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes
title_short Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes
title_sort ensemble consensus-guided unsupervised feature selection to identify huntington’s disease-associated genes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6071299/
https://www.ncbi.nlm.nih.gov/pubmed/30002337
http://dx.doi.org/10.3390/genes9070350
work_keys_str_mv AT guoxia ensembleconsensusguidedunsupervisedfeatureselectiontoidentifyhuntingtonsdiseaseassociatedgenes
AT jiangxue ensembleconsensusguidedunsupervisedfeatureselectiontoidentifyhuntingtonsdiseaseassociatedgenes
AT xujing ensembleconsensusguidedunsupervisedfeatureselectiontoidentifyhuntingtonsdiseaseassociatedgenes
AT quanxiongwen ensembleconsensusguidedunsupervisedfeatureselectiontoidentifyhuntingtonsdiseaseassociatedgenes
AT wumin ensembleconsensusguidedunsupervisedfeatureselectiontoidentifyhuntingtonsdiseaseassociatedgenes
AT zhanghan ensembleconsensusguidedunsupervisedfeatureselectiontoidentifyhuntingtonsdiseaseassociatedgenes