Cargando…

Identifying Cis-Regulatory Sequences by Word Profile Similarity

BACKGROUND: Recognizing regulatory sequences in genomes is a continuing challenge, despite a wealth of available genomic data and a growing number of experimentally validated examples. METHODOLOGY/PRINCIPAL FINDINGS: We discuss here a simple approach to search for regulatory sequences based on the c...

Descripción completa

Detalles Bibliográficos
Autores principales: Leung, Garmay, Eisen, Michael B.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2731932/
https://www.ncbi.nlm.nih.gov/pubmed/19730735
http://dx.doi.org/10.1371/journal.pone.0006901
_version_ 1782170995938820096
author Leung, Garmay
Eisen, Michael B.
author_facet Leung, Garmay
Eisen, Michael B.
author_sort Leung, Garmay
collection PubMed
description BACKGROUND: Recognizing regulatory sequences in genomes is a continuing challenge, despite a wealth of available genomic data and a growing number of experimentally validated examples. METHODOLOGY/PRINCIPAL FINDINGS: We discuss here a simple approach to search for regulatory sequences based on the compositional similarity of genomic regions and known cis-regulatory sequences. This method, which is not limited to searching for predefined motifs, recovers sequences known to be under similar regulatory control. The words shared by the recovered sequences often correspond to known binding sites. Furthermore, we show that although local word profile clustering is predictive for the regulatory sequences involved in blastoderm segmentation, local dissimilarity is a more universal feature of known regulatory sequences in Drosophila. CONCLUSIONS/SIGNIFICANCE: Our method leverages sequence motifs within a known regulatory sequence to identify co-regulated sequences without explicitly defining binding sites. We also show that regulatory sequences can be distinguished from surrounding sequences by local sequence dissimilarity, a novel feature in identifying regulatory sequences across a genome. Source code for WPH-finder is available for download at http://rana.lbl.gov/downloads/wph.tar.gz.
format Text
id pubmed-2731932
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-27319322009-09-04 Identifying Cis-Regulatory Sequences by Word Profile Similarity Leung, Garmay Eisen, Michael B. PLoS One Research Article BACKGROUND: Recognizing regulatory sequences in genomes is a continuing challenge, despite a wealth of available genomic data and a growing number of experimentally validated examples. METHODOLOGY/PRINCIPAL FINDINGS: We discuss here a simple approach to search for regulatory sequences based on the compositional similarity of genomic regions and known cis-regulatory sequences. This method, which is not limited to searching for predefined motifs, recovers sequences known to be under similar regulatory control. The words shared by the recovered sequences often correspond to known binding sites. Furthermore, we show that although local word profile clustering is predictive for the regulatory sequences involved in blastoderm segmentation, local dissimilarity is a more universal feature of known regulatory sequences in Drosophila. CONCLUSIONS/SIGNIFICANCE: Our method leverages sequence motifs within a known regulatory sequence to identify co-regulated sequences without explicitly defining binding sites. We also show that regulatory sequences can be distinguished from surrounding sequences by local sequence dissimilarity, a novel feature in identifying regulatory sequences across a genome. Source code for WPH-finder is available for download at http://rana.lbl.gov/downloads/wph.tar.gz. Public Library of Science 2009-09-04 /pmc/articles/PMC2731932/ /pubmed/19730735 http://dx.doi.org/10.1371/journal.pone.0006901 Text en This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
Leung, Garmay
Eisen, Michael B.
Identifying Cis-Regulatory Sequences by Word Profile Similarity
title Identifying Cis-Regulatory Sequences by Word Profile Similarity
title_full Identifying Cis-Regulatory Sequences by Word Profile Similarity
title_fullStr Identifying Cis-Regulatory Sequences by Word Profile Similarity
title_full_unstemmed Identifying Cis-Regulatory Sequences by Word Profile Similarity
title_short Identifying Cis-Regulatory Sequences by Word Profile Similarity
title_sort identifying cis-regulatory sequences by word profile similarity
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2731932/
https://www.ncbi.nlm.nih.gov/pubmed/19730735
http://dx.doi.org/10.1371/journal.pone.0006901
work_keys_str_mv AT leunggarmay identifyingcisregulatorysequencesbywordprofilesimilarity
AT eisenmichaelb identifyingcisregulatorysequencesbywordprofilesimilarity