Cargando…

A cross-species bi-clustering approach to identifying conserved co-regulated genes

Motivation: A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Jiangwen, Jiang, Zongliang, Tian, Xiuchun, Bi, Jinbo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908362/
https://www.ncbi.nlm.nih.gov/pubmed/27307610
http://dx.doi.org/10.1093/bioinformatics/btw278
_version_ 1782437668175478784
author Sun, Jiangwen
Jiang, Zongliang
Tian, Xiuchun
Bi, Jinbo
author_facet Sun, Jiangwen
Jiang, Zongliang
Tian, Xiuchun
Bi, Jinbo
author_sort Sun, Jiangwen
collection PubMed
description Motivation: A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. Results: We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on synthetic data and compared to the two-step method and several recent joint clustering methods. We then applied this approach to two real world datasets of gene expression during the pre-implantation embryonic development of the human and mouse. Co-regulated genes consistent between the human and mouse were identified, offering insights into conserved functions, as well as similarities and differences in genome activation timing between the human and mouse embryos. Availability and Implementation: The R package containing the implementation of the proposed method in C ++ is available at: https://github.com/JavonSun/mvbc.git and also at the R platform https://www.r-project.org/. Contact: jinbo@engr.uconn.edu
format Online
Article
Text
id pubmed-4908362
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-49083622016-06-17 A cross-species bi-clustering approach to identifying conserved co-regulated genes Sun, Jiangwen Jiang, Zongliang Tian, Xiuchun Bi, Jinbo Bioinformatics Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida Motivation: A growing number of studies have explored the process of pre-implantation embryonic development of multiple mammalian species. However, the conservation and variation among different species in their developmental programming are poorly defined due to the lack of effective computational methods for detecting co-regularized genes that are conserved across species. The most sophisticated method to date for identifying conserved co-regulated genes is a two-step approach. This approach first identifies gene clusters for each species by a cluster analysis of gene expression data, and subsequently computes the overlaps of clusters identified from different species to reveal common subgroups. This approach is ineffective to deal with the noise in the expression data introduced by the complicated procedures in quantifying gene expression. Furthermore, due to the sequential nature of the approach, the gene clusters identified in the first step may have little overlap among different species in the second step, thus difficult to detect conserved co-regulated genes. Results: We propose a cross-species bi-clustering approach which first denoises the gene expression data of each species into a data matrix. The rows of the data matrices of different species represent the same set of genes that are characterized by their expression patterns over the developmental stages of each species as columns. A novel bi-clustering method is then developed to cluster genes into subgroups by a joint sparse rank-one factorization of all the data matrices. This method decomposes a data matrix into a product of a column vector and a row vector where the column vector is a consistent indicator across the matrices (species) to identify the same gene cluster and the row vector specifies for each species the developmental stages that the clustered genes co-regulate. Efficient optimization algorithm has been developed with convergence analysis. This approach was first validated on synthetic data and compared to the two-step method and several recent joint clustering methods. We then applied this approach to two real world datasets of gene expression during the pre-implantation embryonic development of the human and mouse. Co-regulated genes consistent between the human and mouse were identified, offering insights into conserved functions, as well as similarities and differences in genome activation timing between the human and mouse embryos. Availability and Implementation: The R package containing the implementation of the proposed method in C ++ is available at: https://github.com/JavonSun/mvbc.git and also at the R platform https://www.r-project.org/. Contact: jinbo@engr.uconn.edu Oxford University Press 2016-06-15 2016-06-11 /pmc/articles/PMC4908362/ /pubmed/27307610 http://dx.doi.org/10.1093/bioinformatics/btw278 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida
Sun, Jiangwen
Jiang, Zongliang
Tian, Xiuchun
Bi, Jinbo
A cross-species bi-clustering approach to identifying conserved co-regulated genes
title A cross-species bi-clustering approach to identifying conserved co-regulated genes
title_full A cross-species bi-clustering approach to identifying conserved co-regulated genes
title_fullStr A cross-species bi-clustering approach to identifying conserved co-regulated genes
title_full_unstemmed A cross-species bi-clustering approach to identifying conserved co-regulated genes
title_short A cross-species bi-clustering approach to identifying conserved co-regulated genes
title_sort cross-species bi-clustering approach to identifying conserved co-regulated genes
topic Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908362/
https://www.ncbi.nlm.nih.gov/pubmed/27307610
http://dx.doi.org/10.1093/bioinformatics/btw278
work_keys_str_mv AT sunjiangwen acrossspeciesbiclusteringapproachtoidentifyingconservedcoregulatedgenes
AT jiangzongliang acrossspeciesbiclusteringapproachtoidentifyingconservedcoregulatedgenes
AT tianxiuchun acrossspeciesbiclusteringapproachtoidentifyingconservedcoregulatedgenes
AT bijinbo acrossspeciesbiclusteringapproachtoidentifyingconservedcoregulatedgenes
AT sunjiangwen crossspeciesbiclusteringapproachtoidentifyingconservedcoregulatedgenes
AT jiangzongliang crossspeciesbiclusteringapproachtoidentifyingconservedcoregulatedgenes
AT tianxiuchun crossspeciesbiclusteringapproachtoidentifyingconservedcoregulatedgenes
AT bijinbo crossspeciesbiclusteringapproachtoidentifyingconservedcoregulatedgenes