Cargando…
Efficient algorithms for biological stems search
BACKGROUND: Motifs are significant patterns in DNA, RNA, and protein sequences, which play an important role in biological processes and functions, like identification of open reading frames, RNA transcription, protein binding, etc. Several versions of the motif search problem have been studied in t...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679804/ https://www.ncbi.nlm.nih.gov/pubmed/23679045 http://dx.doi.org/10.1186/1471-2105-14-161 |
_version_ | 1782273020347285504 |
---|---|
author | Mi, Tian Rajasekaran, Sanguthevar |
author_facet | Mi, Tian Rajasekaran, Sanguthevar |
author_sort | Mi, Tian |
collection | PubMed |
description | BACKGROUND: Motifs are significant patterns in DNA, RNA, and protein sequences, which play an important role in biological processes and functions, like identification of open reading frames, RNA transcription, protein binding, etc. Several versions of the motif search problem have been studied in the literature. One such version is called the Planted Motif Search (PMS)or (l, d)-motif Search. PMS is known to be NP complete. The time complexities of most of the planted motif search algorithms depend exponentially on the alphabet size. Recently a new version of the motif search problem has been introduced by Kuksa and Pavlovic. We call this version as the Motif Stems Search (MSS) problem. A motif stem is an l-mer (for some relevant value of l)with some wildcard characters and hence corresponds to a set of l-mers (without wildcards), some of which are (l, d)-motifs. Kuksa and Pavlovic have presented an efficient algorithm to find motif stems for inputs from large alphabets. Ideally, the number of stems output should be as small as possible since the stems form a superset of the motifs. RESULTS: In this paper we propose an efficient algorithm for MSS and evaluate it on both synthetic and real data. This evaluation reveals that our algorithm is much faster than Kuksa and Pavlovic’s algorithm. CONCLUSIONS: Our MSS algorithm outperforms the algorithm of Kuksa and Pavlovic in terms of the run time as well as the number of stems output. Specifically, the stems output by our algorithm form a proper (and much smaller)subset of the stems output by Kuksa and Pavlovic’s algorithm. |
format | Online Article Text |
id | pubmed-3679804 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-36798042013-06-25 Efficient algorithms for biological stems search Mi, Tian Rajasekaran, Sanguthevar BMC Bioinformatics Research Article BACKGROUND: Motifs are significant patterns in DNA, RNA, and protein sequences, which play an important role in biological processes and functions, like identification of open reading frames, RNA transcription, protein binding, etc. Several versions of the motif search problem have been studied in the literature. One such version is called the Planted Motif Search (PMS)or (l, d)-motif Search. PMS is known to be NP complete. The time complexities of most of the planted motif search algorithms depend exponentially on the alphabet size. Recently a new version of the motif search problem has been introduced by Kuksa and Pavlovic. We call this version as the Motif Stems Search (MSS) problem. A motif stem is an l-mer (for some relevant value of l)with some wildcard characters and hence corresponds to a set of l-mers (without wildcards), some of which are (l, d)-motifs. Kuksa and Pavlovic have presented an efficient algorithm to find motif stems for inputs from large alphabets. Ideally, the number of stems output should be as small as possible since the stems form a superset of the motifs. RESULTS: In this paper we propose an efficient algorithm for MSS and evaluate it on both synthetic and real data. This evaluation reveals that our algorithm is much faster than Kuksa and Pavlovic’s algorithm. CONCLUSIONS: Our MSS algorithm outperforms the algorithm of Kuksa and Pavlovic in terms of the run time as well as the number of stems output. Specifically, the stems output by our algorithm form a proper (and much smaller)subset of the stems output by Kuksa and Pavlovic’s algorithm. BioMed Central 2013-05-16 /pmc/articles/PMC3679804/ /pubmed/23679045 http://dx.doi.org/10.1186/1471-2105-14-161 Text en Copyright © 2013 Mi and Rajasekaran; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Mi, Tian Rajasekaran, Sanguthevar Efficient algorithms for biological stems search |
title | Efficient algorithms for biological stems search |
title_full | Efficient algorithms for biological stems search |
title_fullStr | Efficient algorithms for biological stems search |
title_full_unstemmed | Efficient algorithms for biological stems search |
title_short | Efficient algorithms for biological stems search |
title_sort | efficient algorithms for biological stems search |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679804/ https://www.ncbi.nlm.nih.gov/pubmed/23679045 http://dx.doi.org/10.1186/1471-2105-14-161 |
work_keys_str_mv | AT mitian efficientalgorithmsforbiologicalstemssearch AT rajasekaransanguthevar efficientalgorithmsforbiologicalstemssearch |