Cargando…

Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection

BACKGROUND: Many variable selection techniques have been proposed for the clustering of gene expression data. While these methods tend to filter out irrelevant genes and identify informative genes that contribute to a clustering solution, they are based on criteria that do not consider the potential...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Zixing, Lucas, F Anthony San, Qiu, Peng, Liu, Yin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4035826/
https://www.ncbi.nlm.nih.gov/pubmed/24885641
http://dx.doi.org/10.1186/1471-2105-15-153
_version_ 1782318107997503488
author Wang, Zixing
Lucas, F Anthony San
Qiu, Peng
Liu, Yin
author_facet Wang, Zixing
Lucas, F Anthony San
Qiu, Peng
Liu, Yin
author_sort Wang, Zixing
collection PubMed
description BACKGROUND: Many variable selection techniques have been proposed for the clustering of gene expression data. While these methods tend to filter out irrelevant genes and identify informative genes that contribute to a clustering solution, they are based on criteria that do not consider the potential interactive influence among individual genes. Motivated by ensemble clustering, there is a strong interest in leveraging the structure of gene networks for gene selection, so that the relationship information between genes can be effectively utilized, while the selected genes are expected to preserve all the possible clustering structures in the data. RESULTS: We present a new filter method that uses the gene connectivity in the gene co-expression network as the evaluation criteria for variable selection. The gene connectivity measures the importance of the genes in term of their expression similarity with others in the co-expression network. The hard threshold and soft threshold transformations are employed to construct the gene co-expression networks. Both simulation studies and real data analysis have shown that the network based on soft thresholding is more effective in selecting relevant variables and provides better clustering results compared to the hard thresholding transformation and two other canonical filter methods for variable selection. Furthermore, a new module analysis approach is proposed to reveal the higher order organization of the gene space, where the genes of a module share significant topological similarity and are associated with a consensus partition of the sample space. We demonstrate that the identified modules can lead to biologically meaningful sample partitions that might be missed by other methods. CONCLUSIONS: By leveraging the structure of gene co-expression network, first we propose a variable selection method that selects individual genes with top connectivity. Both simulation studies and real data application have demonstrated that our method has better performance in terms of the reliability of the selected genes and sample clustering results. In addition, we propose a module recovery method that can help discover novel sample partitions that might be hidden when performing clustering analyses using all available genes. The source code of our program is available at http://nba.uth.tmc.edu/homepage/liu/netVar/.
format Online
Article
Text
id pubmed-4035826
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40358262014-06-11 Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection Wang, Zixing Lucas, F Anthony San Qiu, Peng Liu, Yin BMC Bioinformatics Research Article BACKGROUND: Many variable selection techniques have been proposed for the clustering of gene expression data. While these methods tend to filter out irrelevant genes and identify informative genes that contribute to a clustering solution, they are based on criteria that do not consider the potential interactive influence among individual genes. Motivated by ensemble clustering, there is a strong interest in leveraging the structure of gene networks for gene selection, so that the relationship information between genes can be effectively utilized, while the selected genes are expected to preserve all the possible clustering structures in the data. RESULTS: We present a new filter method that uses the gene connectivity in the gene co-expression network as the evaluation criteria for variable selection. The gene connectivity measures the importance of the genes in term of their expression similarity with others in the co-expression network. The hard threshold and soft threshold transformations are employed to construct the gene co-expression networks. Both simulation studies and real data analysis have shown that the network based on soft thresholding is more effective in selecting relevant variables and provides better clustering results compared to the hard thresholding transformation and two other canonical filter methods for variable selection. Furthermore, a new module analysis approach is proposed to reveal the higher order organization of the gene space, where the genes of a module share significant topological similarity and are associated with a consensus partition of the sample space. We demonstrate that the identified modules can lead to biologically meaningful sample partitions that might be missed by other methods. CONCLUSIONS: By leveraging the structure of gene co-expression network, first we propose a variable selection method that selects individual genes with top connectivity. Both simulation studies and real data application have demonstrated that our method has better performance in terms of the reliability of the selected genes and sample clustering results. In addition, we propose a module recovery method that can help discover novel sample partitions that might be hidden when performing clustering analyses using all available genes. The source code of our program is available at http://nba.uth.tmc.edu/homepage/liu/netVar/. BioMed Central 2014-05-20 /pmc/articles/PMC4035826/ /pubmed/24885641 http://dx.doi.org/10.1186/1471-2105-15-153 Text en Copyright © 2014 Wang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Wang, Zixing
Lucas, F Anthony San
Qiu, Peng
Liu, Yin
Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection
title Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection
title_full Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection
title_fullStr Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection
title_full_unstemmed Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection
title_short Improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection
title_sort improving the sensitivity of sample clustering by leveraging gene co-expression networks in variable selection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4035826/
https://www.ncbi.nlm.nih.gov/pubmed/24885641
http://dx.doi.org/10.1186/1471-2105-15-153
work_keys_str_mv AT wangzixing improvingthesensitivityofsampleclusteringbyleveraginggenecoexpressionnetworksinvariableselection
AT lucasfanthonysan improvingthesensitivityofsampleclusteringbyleveraginggenecoexpressionnetworksinvariableselection
AT qiupeng improvingthesensitivityofsampleclusteringbyleveraginggenecoexpressionnetworksinvariableselection
AT liuyin improvingthesensitivityofsampleclusteringbyleveraginggenecoexpressionnetworksinvariableselection