Cargando…

Efficient algorithms for biological stems search

BACKGROUND: Motifs are significant patterns in DNA, RNA, and protein sequences, which play an important role in biological processes and functions, like identification of open reading frames, RNA transcription, protein binding, etc. Several versions of the motif search problem have been studied in t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mi, Tian, Rajasekaran, Sanguthevar
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679804/ https://www.ncbi.nlm.nih.gov/pubmed/23679045 http://dx.doi.org/10.1186/1471-2105-14-161

_version_	1782273020347285504
author	Mi, Tian Rajasekaran, Sanguthevar
author_facet	Mi, Tian Rajasekaran, Sanguthevar
author_sort	Mi, Tian
collection	PubMed
description	BACKGROUND: Motifs are significant patterns in DNA, RNA, and protein sequences, which play an important role in biological processes and functions, like identification of open reading frames, RNA transcription, protein binding, etc. Several versions of the motif search problem have been studied in the literature. One such version is called the Planted Motif Search (PMS)or (l, d)-motif Search. PMS is known to be NP complete. The time complexities of most of the planted motif search algorithms depend exponentially on the alphabet size. Recently a new version of the motif search problem has been introduced by Kuksa and Pavlovic. We call this version as the Motif Stems Search (MSS) problem. A motif stem is an l-mer (for some relevant value of l)with some wildcard characters and hence corresponds to a set of l-mers (without wildcards), some of which are (l, d)-motifs. Kuksa and Pavlovic have presented an efficient algorithm to find motif stems for inputs from large alphabets. Ideally, the number of stems output should be as small as possible since the stems form a superset of the motifs. RESULTS: In this paper we propose an efficient algorithm for MSS and evaluate it on both synthetic and real data. This evaluation reveals that our algorithm is much faster than Kuksa and Pavlovic’s algorithm. CONCLUSIONS: Our MSS algorithm outperforms the algorithm of Kuksa and Pavlovic in terms of the run time as well as the number of stems output. Specifically, the stems output by our algorithm form a proper (and much smaller)subset of the stems output by Kuksa and Pavlovic’s algorithm.
format	Online Article Text
id	pubmed-3679804
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-36798042013-06-25 Efficient algorithms for biological stems search Mi, Tian Rajasekaran, Sanguthevar BMC Bioinformatics Research Article BACKGROUND: Motifs are significant patterns in DNA, RNA, and protein sequences, which play an important role in biological processes and functions, like identification of open reading frames, RNA transcription, protein binding, etc. Several versions of the motif search problem have been studied in the literature. One such version is called the Planted Motif Search (PMS)or (l, d)-motif Search. PMS is known to be NP complete. The time complexities of most of the planted motif search algorithms depend exponentially on the alphabet size. Recently a new version of the motif search problem has been introduced by Kuksa and Pavlovic. We call this version as the Motif Stems Search (MSS) problem. A motif stem is an l-mer (for some relevant value of l)with some wildcard characters and hence corresponds to a set of l-mers (without wildcards), some of which are (l, d)-motifs. Kuksa and Pavlovic have presented an efficient algorithm to find motif stems for inputs from large alphabets. Ideally, the number of stems output should be as small as possible since the stems form a superset of the motifs. RESULTS: In this paper we propose an efficient algorithm for MSS and evaluate it on both synthetic and real data. This evaluation reveals that our algorithm is much faster than Kuksa and Pavlovic’s algorithm. CONCLUSIONS: Our MSS algorithm outperforms the algorithm of Kuksa and Pavlovic in terms of the run time as well as the number of stems output. Specifically, the stems output by our algorithm form a proper (and much smaller)subset of the stems output by Kuksa and Pavlovic’s algorithm. BioMed Central 2013-05-16 /pmc/articles/PMC3679804/ /pubmed/23679045 http://dx.doi.org/10.1186/1471-2105-14-161 Text en Copyright © 2013 Mi and Rajasekaran; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Mi, Tian Rajasekaran, Sanguthevar Efficient algorithms for biological stems search
title	Efficient algorithms for biological stems search
title_full	Efficient algorithms for biological stems search
title_fullStr	Efficient algorithms for biological stems search
title_full_unstemmed	Efficient algorithms for biological stems search
title_short	Efficient algorithms for biological stems search
title_sort	efficient algorithms for biological stems search
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679804/ https://www.ncbi.nlm.nih.gov/pubmed/23679045 http://dx.doi.org/10.1186/1471-2105-14-161
work_keys_str_mv	AT mitian efficientalgorithmsforbiologicalstemssearch AT rajasekaransanguthevar efficientalgorithmsforbiologicalstemssearch

Efficient algorithms for biological stems search

Ejemplares similares