Cargando…

Efficient algorithms for biological stems search

BACKGROUND: Motifs are significant patterns in DNA, RNA, and protein sequences, which play an important role in biological processes and functions, like identification of open reading frames, RNA transcription, protein binding, etc. Several versions of the motif search problem have been studied in t...

Descripción completa

Detalles Bibliográficos
Autores principales: Mi, Tian, Rajasekaran, Sanguthevar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679804/
https://www.ncbi.nlm.nih.gov/pubmed/23679045
http://dx.doi.org/10.1186/1471-2105-14-161
_version_ 1782273020347285504
author Mi, Tian
Rajasekaran, Sanguthevar
author_facet Mi, Tian
Rajasekaran, Sanguthevar
author_sort Mi, Tian
collection PubMed
description BACKGROUND: Motifs are significant patterns in DNA, RNA, and protein sequences, which play an important role in biological processes and functions, like identification of open reading frames, RNA transcription, protein binding, etc. Several versions of the motif search problem have been studied in the literature. One such version is called the Planted Motif Search (PMS)or (l, d)-motif Search. PMS is known to be NP complete. The time complexities of most of the planted motif search algorithms depend exponentially on the alphabet size. Recently a new version of the motif search problem has been introduced by Kuksa and Pavlovic. We call this version as the Motif Stems Search (MSS) problem. A motif stem is an l-mer (for some relevant value of l)with some wildcard characters and hence corresponds to a set of l-mers (without wildcards), some of which are (l, d)-motifs. Kuksa and Pavlovic have presented an efficient algorithm to find motif stems for inputs from large alphabets. Ideally, the number of stems output should be as small as possible since the stems form a superset of the motifs. RESULTS: In this paper we propose an efficient algorithm for MSS and evaluate it on both synthetic and real data. This evaluation reveals that our algorithm is much faster than Kuksa and Pavlovic’s algorithm. CONCLUSIONS: Our MSS algorithm outperforms the algorithm of Kuksa and Pavlovic in terms of the run time as well as the number of stems output. Specifically, the stems output by our algorithm form a proper (and much smaller)subset of the stems output by Kuksa and Pavlovic’s algorithm.
format Online
Article
Text
id pubmed-3679804
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36798042013-06-25 Efficient algorithms for biological stems search Mi, Tian Rajasekaran, Sanguthevar BMC Bioinformatics Research Article BACKGROUND: Motifs are significant patterns in DNA, RNA, and protein sequences, which play an important role in biological processes and functions, like identification of open reading frames, RNA transcription, protein binding, etc. Several versions of the motif search problem have been studied in the literature. One such version is called the Planted Motif Search (PMS)or (l, d)-motif Search. PMS is known to be NP complete. The time complexities of most of the planted motif search algorithms depend exponentially on the alphabet size. Recently a new version of the motif search problem has been introduced by Kuksa and Pavlovic. We call this version as the Motif Stems Search (MSS) problem. A motif stem is an l-mer (for some relevant value of l)with some wildcard characters and hence corresponds to a set of l-mers (without wildcards), some of which are (l, d)-motifs. Kuksa and Pavlovic have presented an efficient algorithm to find motif stems for inputs from large alphabets. Ideally, the number of stems output should be as small as possible since the stems form a superset of the motifs. RESULTS: In this paper we propose an efficient algorithm for MSS and evaluate it on both synthetic and real data. This evaluation reveals that our algorithm is much faster than Kuksa and Pavlovic’s algorithm. CONCLUSIONS: Our MSS algorithm outperforms the algorithm of Kuksa and Pavlovic in terms of the run time as well as the number of stems output. Specifically, the stems output by our algorithm form a proper (and much smaller)subset of the stems output by Kuksa and Pavlovic’s algorithm. BioMed Central 2013-05-16 /pmc/articles/PMC3679804/ /pubmed/23679045 http://dx.doi.org/10.1186/1471-2105-14-161 Text en Copyright © 2013 Mi and Rajasekaran; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Mi, Tian
Rajasekaran, Sanguthevar
Efficient algorithms for biological stems search
title Efficient algorithms for biological stems search
title_full Efficient algorithms for biological stems search
title_fullStr Efficient algorithms for biological stems search
title_full_unstemmed Efficient algorithms for biological stems search
title_short Efficient algorithms for biological stems search
title_sort efficient algorithms for biological stems search
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679804/
https://www.ncbi.nlm.nih.gov/pubmed/23679045
http://dx.doi.org/10.1186/1471-2105-14-161
work_keys_str_mv AT mitian efficientalgorithmsforbiologicalstemssearch
AT rajasekaransanguthevar efficientalgorithmsforbiologicalstemssearch