Cargando…

Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development

A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profi...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Xuejing, Panea, Casandra, Wiggins, Chris H., Reinke, Valerie, Leslie, Christina
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2861633/
https://www.ncbi.nlm.nih.gov/pubmed/20454681
http://dx.doi.org/10.1371/journal.pcbi.1000761
_version_ 1782180651136450560
author Li, Xuejing
Panea, Casandra
Wiggins, Chris H.
Reinke, Valerie
Leslie, Christina
author_facet Li, Xuejing
Panea, Casandra
Wiggins, Chris H.
Reinke, Valerie
Leslie, Christina
author_sort Li, Xuejing
collection PubMed
description A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs. Moreover, if many gene expression profiles in a data set are highly correlated, as in the case of whole organism developmental time series, it may be difficult to resolve fine-grained clusters in the first place. We present a predictive framework for modeling the natural flow of information, from promoter sequence to expression, to learn cis regulatory motifs and characterize gene expression patterns in developmental time courses. We introduce a cluster-free algorithm based on a graph-regularized version of partial least squares (PLS) regression to learn sequence patterns—represented by graphs of k-mers, or “graph-mers”—that predict gene expression trajectories. Applying the approach to wildtype germline development in Caenorhabditis elegans, we found that the first and second latent PLS factors mapped to expression profiles for oocyte and sperm genes, respectively. We extracted both known and novel motifs from the graph-mers associated to these germline-specific patterns, including novel CG-rich motifs specific to oocyte genes. We found evidence supporting the functional relevance of these putative regulatory elements through analysis of positional bias, motif conservation and in situ gene expression. This study demonstrates that our regression model can learn biologically meaningful latent structure and identify potentially functional motifs from subtle developmental time course expression data.
format Text
id pubmed-2861633
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-28616332010-05-07 Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development Li, Xuejing Panea, Casandra Wiggins, Chris H. Reinke, Valerie Leslie, Christina PLoS Comput Biol Research Article A key problem in understanding transcriptional regulatory networks is deciphering what cis regulatory logic is encoded in gene promoter sequences and how this sequence information maps to expression. A typical computational approach to this problem involves clustering genes by their expression profiles and then searching for overrepresented motifs in the promoter sequences of genes in a cluster. However, genes with similar expression profiles may be controlled by distinct regulatory programs. Moreover, if many gene expression profiles in a data set are highly correlated, as in the case of whole organism developmental time series, it may be difficult to resolve fine-grained clusters in the first place. We present a predictive framework for modeling the natural flow of information, from promoter sequence to expression, to learn cis regulatory motifs and characterize gene expression patterns in developmental time courses. We introduce a cluster-free algorithm based on a graph-regularized version of partial least squares (PLS) regression to learn sequence patterns—represented by graphs of k-mers, or “graph-mers”—that predict gene expression trajectories. Applying the approach to wildtype germline development in Caenorhabditis elegans, we found that the first and second latent PLS factors mapped to expression profiles for oocyte and sperm genes, respectively. We extracted both known and novel motifs from the graph-mers associated to these germline-specific patterns, including novel CG-rich motifs specific to oocyte genes. We found evidence supporting the functional relevance of these putative regulatory elements through analysis of positional bias, motif conservation and in situ gene expression. This study demonstrates that our regression model can learn biologically meaningful latent structure and identify potentially functional motifs from subtle developmental time course expression data. Public Library of Science 2010-04-29 /pmc/articles/PMC2861633/ /pubmed/20454681 http://dx.doi.org/10.1371/journal.pcbi.1000761 Text en Li et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Li, Xuejing
Panea, Casandra
Wiggins, Chris H.
Reinke, Valerie
Leslie, Christina
Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development
title Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development
title_full Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development
title_fullStr Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development
title_full_unstemmed Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development
title_short Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development
title_sort learning “graph-mer” motifs that predict gene expression trajectories in development
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2861633/
https://www.ncbi.nlm.nih.gov/pubmed/20454681
http://dx.doi.org/10.1371/journal.pcbi.1000761
work_keys_str_mv AT lixuejing learninggraphmermotifsthatpredictgeneexpressiontrajectoriesindevelopment
AT paneacasandra learninggraphmermotifsthatpredictgeneexpressiontrajectoriesindevelopment
AT wigginschrish learninggraphmermotifsthatpredictgeneexpressiontrajectoriesindevelopment
AT reinkevalerie learninggraphmermotifsthatpredictgeneexpressiontrajectoriesindevelopment
AT lesliechristina learninggraphmermotifsthatpredictgeneexpressiontrajectoriesindevelopment