Cargando…

Assisted clustering of gene expression data using ANCut

BACKGROUND: In biomedical research, gene expression profiling studies have been extensively conducted. The analysis of gene expression data has led to a deeper understanding of human genetics as well as practically useful models. Clustering analysis has been a critical component of gene expression d...

Descripción completa

Detalles Bibliográficos
Autores principales: Teran Hidalgo, Sebastian J., Wu, Mengyun, Ma, Shuangge
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5559859/
https://www.ncbi.nlm.nih.gov/pubmed/28814280
http://dx.doi.org/10.1186/s12864-017-3990-1
_version_ 1783257594061651968
author Teran Hidalgo, Sebastian J.
Wu, Mengyun
Ma, Shuangge
author_facet Teran Hidalgo, Sebastian J.
Wu, Mengyun
Ma, Shuangge
author_sort Teran Hidalgo, Sebastian J.
collection PubMed
description BACKGROUND: In biomedical research, gene expression profiling studies have been extensively conducted. The analysis of gene expression data has led to a deeper understanding of human genetics as well as practically useful models. Clustering analysis has been a critical component of gene expression data analysis and can reveal the (previously unknown) interconnections among genes. With the high dimensionality of gene expression data, many of the existing clustering methods and results are not as satisfactory. Intuitively, this is caused by “a lack of information”. In recent profiling studies, a prominent trend is to collect data on gene expressions as well as their regulators (copy number alteration, microRNA, methylation, etc.) on the same subjects, making it possible to borrow information from other types of omics measurements in gene expression analysis. METHODS: In this study, an ANCut approach is developed, which is built on the regularized estimation and NCut techniques. An effective R code that implements this approach is developed. RESULTS: Simulation shows that the proposed approach outperforms direct competitors. The analysis of TCGA (The Cancer Genome Atlas) data further demonstrates its satisfactory performance. CONCLUSIONS: We propose a more effective clustering analysis of gene expression data, with the assistance of information from regulators. It provides a new venue for analyzing gene expression data based on the assisted analysis strategy. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3990-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5559859
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55598592017-08-18 Assisted clustering of gene expression data using ANCut Teran Hidalgo, Sebastian J. Wu, Mengyun Ma, Shuangge BMC Genomics Methodology Article BACKGROUND: In biomedical research, gene expression profiling studies have been extensively conducted. The analysis of gene expression data has led to a deeper understanding of human genetics as well as practically useful models. Clustering analysis has been a critical component of gene expression data analysis and can reveal the (previously unknown) interconnections among genes. With the high dimensionality of gene expression data, many of the existing clustering methods and results are not as satisfactory. Intuitively, this is caused by “a lack of information”. In recent profiling studies, a prominent trend is to collect data on gene expressions as well as their regulators (copy number alteration, microRNA, methylation, etc.) on the same subjects, making it possible to borrow information from other types of omics measurements in gene expression analysis. METHODS: In this study, an ANCut approach is developed, which is built on the regularized estimation and NCut techniques. An effective R code that implements this approach is developed. RESULTS: Simulation shows that the proposed approach outperforms direct competitors. The analysis of TCGA (The Cancer Genome Atlas) data further demonstrates its satisfactory performance. CONCLUSIONS: We propose a more effective clustering analysis of gene expression data, with the assistance of information from regulators. It provides a new venue for analyzing gene expression data based on the assisted analysis strategy. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3990-1) contains supplementary material, which is available to authorized users. BioMed Central 2017-08-16 /pmc/articles/PMC5559859/ /pubmed/28814280 http://dx.doi.org/10.1186/s12864-017-3990-1 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Teran Hidalgo, Sebastian J.
Wu, Mengyun
Ma, Shuangge
Assisted clustering of gene expression data using ANCut
title Assisted clustering of gene expression data using ANCut
title_full Assisted clustering of gene expression data using ANCut
title_fullStr Assisted clustering of gene expression data using ANCut
title_full_unstemmed Assisted clustering of gene expression data using ANCut
title_short Assisted clustering of gene expression data using ANCut
title_sort assisted clustering of gene expression data using ancut
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5559859/
https://www.ncbi.nlm.nih.gov/pubmed/28814280
http://dx.doi.org/10.1186/s12864-017-3990-1
work_keys_str_mv AT teranhidalgosebastianj assistedclusteringofgeneexpressiondatausingancut
AT wumengyun assistedclusteringofgeneexpressiondatausingancut
AT mashuangge assistedclusteringofgeneexpressiondatausingancut