Cargando…

Efficient sequential and parallel algorithms for planted motif search

BACKGROUND: Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want t...

Descripción completa

Detalles Bibliográficos
Autores principales: Nicolae, Marius, Rajasekaran, Sanguthevar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3924400/
https://www.ncbi.nlm.nih.gov/pubmed/24479443
http://dx.doi.org/10.1186/1471-2105-15-34
_version_ 1782303738779664384
author Nicolae, Marius
Rajasekaran, Sanguthevar
author_facet Nicolae, Marius
Rajasekaran, Sanguthevar
author_sort Nicolae, Marius
collection PubMed
description BACKGROUND: Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances. RESULTS: This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/. CONCLUSIONS: We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms.
format Online
Article
Text
id pubmed-3924400
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39244002014-03-03 Efficient sequential and parallel algorithms for planted motif search Nicolae, Marius Rajasekaran, Sanguthevar BMC Bioinformatics Research Article BACKGROUND: Motif searching is an important step in the detection of rare events occurring in a set of DNA or protein sequences. One formulation of the problem is known as (l,d)-motif search or Planted Motif Search (PMS). In PMS we are given two integers l and d and n biological sequences. We want to find all sequences of length l that appear in each of the input sequences with at most d mismatches. The PMS problem is NP-complete. PMS algorithms are typically evaluated on certain instances considered challenging. Despite ample research in the area, a considerable performance gap exists because many state of the art algorithms have large runtimes even for moderately challenging instances. RESULTS: This paper presents a fast exact parallel PMS algorithm called PMS8. PMS8 is the first algorithm to solve the challenging (l,d) instances (25,10) and (26,11). PMS8 is also efficient on instances with larger l and d such as (50,21). We include a comparison of PMS8 with several state of the art algorithms on multiple problem instances. This paper also presents necessary and sufficient conditions for 3 l-mers to have a common d-neighbor. The program is freely available at http://engr.uconn.edu/~man09004/PMS8/. CONCLUSIONS: We present PMS8, an efficient exact algorithm for Planted Motif Search. PMS8 introduces novel ideas for generating common neighborhoods. We have also implemented a parallel version for this algorithm. PMS8 can solve instances not solved by any previous algorithms. BioMed Central 2014-01-31 /pmc/articles/PMC3924400/ /pubmed/24479443 http://dx.doi.org/10.1186/1471-2105-15-34 Text en Copyright © 2014 Nicolae and Rajasekaran; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Nicolae, Marius
Rajasekaran, Sanguthevar
Efficient sequential and parallel algorithms for planted motif search
title Efficient sequential and parallel algorithms for planted motif search
title_full Efficient sequential and parallel algorithms for planted motif search
title_fullStr Efficient sequential and parallel algorithms for planted motif search
title_full_unstemmed Efficient sequential and parallel algorithms for planted motif search
title_short Efficient sequential and parallel algorithms for planted motif search
title_sort efficient sequential and parallel algorithms for planted motif search
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3924400/
https://www.ncbi.nlm.nih.gov/pubmed/24479443
http://dx.doi.org/10.1186/1471-2105-15-34
work_keys_str_mv AT nicolaemarius efficientsequentialandparallelalgorithmsforplantedmotifsearch
AT rajasekaransanguthevar efficientsequentialandparallelalgorithmsforplantedmotifsearch