Cargando…
Efficient motif search in ranked lists and applications to variable gap motifs
Sequence elements, at all levels—DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on measuring and investigating sequence-based recognition make use of statistical and computational tools, including approa...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3401424/ https://www.ncbi.nlm.nih.gov/pubmed/22416066 http://dx.doi.org/10.1093/nar/gks206 |
_version_ | 1782238597935529984 |
---|---|
author | Leibovich, Limor Yakhini, Zohar |
author_facet | Leibovich, Limor Yakhini, Zohar |
author_sort | Leibovich, Limor |
collection | PubMed |
description | Sequence elements, at all levels—DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs—two half sites with a flexible length gap in between—and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation. |
format | Online Article Text |
id | pubmed-3401424 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-34014242012-07-23 Efficient motif search in ranked lists and applications to variable gap motifs Leibovich, Limor Yakhini, Zohar Nucleic Acids Res Computational Biology Sequence elements, at all levels—DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs—two half sites with a flexible length gap in between—and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation. Oxford University Press 2012-07 2012-03-13 /pmc/articles/PMC3401424/ /pubmed/22416066 http://dx.doi.org/10.1093/nar/gks206 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Computational Biology Leibovich, Limor Yakhini, Zohar Efficient motif search in ranked lists and applications to variable gap motifs |
title | Efficient motif search in ranked lists and applications to variable gap motifs |
title_full | Efficient motif search in ranked lists and applications to variable gap motifs |
title_fullStr | Efficient motif search in ranked lists and applications to variable gap motifs |
title_full_unstemmed | Efficient motif search in ranked lists and applications to variable gap motifs |
title_short | Efficient motif search in ranked lists and applications to variable gap motifs |
title_sort | efficient motif search in ranked lists and applications to variable gap motifs |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3401424/ https://www.ncbi.nlm.nih.gov/pubmed/22416066 http://dx.doi.org/10.1093/nar/gks206 |
work_keys_str_mv | AT leibovichlimor efficientmotifsearchinrankedlistsandapplicationstovariablegapmotifs AT yakhinizohar efficientmotifsearchinrankedlistsandapplicationstovariablegapmotifs |