Cargando…

Efficient motif search in ranked lists and applications to variable gap motifs

Sequence elements, at all levels—DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on measuring and investigating sequence-based recognition make use of statistical and computational tools, including approa...

Descripción completa

Detalles Bibliográficos
Autores principales: Leibovich, Limor, Yakhini, Zohar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3401424/
https://www.ncbi.nlm.nih.gov/pubmed/22416066
http://dx.doi.org/10.1093/nar/gks206
_version_ 1782238597935529984
author Leibovich, Limor
Yakhini, Zohar
author_facet Leibovich, Limor
Yakhini, Zohar
author_sort Leibovich, Limor
collection PubMed
description Sequence elements, at all levels—DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs—two half sites with a flexible length gap in between—and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation.
format Online
Article
Text
id pubmed-3401424
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-34014242012-07-23 Efficient motif search in ranked lists and applications to variable gap motifs Leibovich, Limor Yakhini, Zohar Nucleic Acids Res Computational Biology Sequence elements, at all levels—DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs—two half sites with a flexible length gap in between—and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation. Oxford University Press 2012-07 2012-03-13 /pmc/articles/PMC3401424/ /pubmed/22416066 http://dx.doi.org/10.1093/nar/gks206 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Leibovich, Limor
Yakhini, Zohar
Efficient motif search in ranked lists and applications to variable gap motifs
title Efficient motif search in ranked lists and applications to variable gap motifs
title_full Efficient motif search in ranked lists and applications to variable gap motifs
title_fullStr Efficient motif search in ranked lists and applications to variable gap motifs
title_full_unstemmed Efficient motif search in ranked lists and applications to variable gap motifs
title_short Efficient motif search in ranked lists and applications to variable gap motifs
title_sort efficient motif search in ranked lists and applications to variable gap motifs
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3401424/
https://www.ncbi.nlm.nih.gov/pubmed/22416066
http://dx.doi.org/10.1093/nar/gks206
work_keys_str_mv AT leibovichlimor efficientmotifsearchinrankedlistsandapplicationstovariablegapmotifs
AT yakhinizohar efficientmotifsearchinrankedlistsandapplicationstovariablegapmotifs