Cargando…

A hybrid method for the exact planted (l, d) motif finding problem and its parallelization

BACKGROUND: Given a set of DNA sequences s(1), ..., s(t), the (l, d) motif problem is to find an l-length motif sequence M , not necessary existing in any of the input sequences, such that for each sequence s(i), 1 ≤ i ≤ t, there is at least one subsequence differing with at most d mismatches from M...

Descripción completa

Detalles Bibliográficos
Autores principales: Abbas, Mostafa M, Abouelhoda, Mohamed, Bahig, Hazem M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521218/
https://www.ncbi.nlm.nih.gov/pubmed/23281969
http://dx.doi.org/10.1186/1471-2105-13-S17-S10
_version_ 1782252907559649280
author Abbas, Mostafa M
Abouelhoda, Mohamed
Bahig, Hazem M
author_facet Abbas, Mostafa M
Abouelhoda, Mohamed
Bahig, Hazem M
author_sort Abbas, Mostafa M
collection PubMed
description BACKGROUND: Given a set of DNA sequences s(1), ..., s(t), the (l, d) motif problem is to find an l-length motif sequence M , not necessary existing in any of the input sequences, such that for each sequence s(i), 1 ≤ i ≤ t, there is at least one subsequence differing with at most d mismatches from M. Many exact algorithms have been developed to solve the motif finding problem in the last three decades. However, the problem is still challenging and its solution is limited to small values of l and d. RESULTS: In this paper we present a new efficient method to improve the performance of the exact algorithms for the motif finding problem. Our method is composed of two main steps: First, we process q ≤ t sequences to find candidate motifs. Second, the candidate motifs are searched in the remaining sequences. For both steps, we use the best available algorithms. Our method is a hybrid one, because it integrates currently existing algorithms to achieve the best running time. In this paper, we show how the optimal value of q is determined to achieve the best running time. Our experimental results show that there is about 24% speed-up achieved by our method compared to the best existing algorithm. Furthermore, we also present a parallel version of our method running on shared memory architecture. Our experiments show that the performance of our algorithm scales linearly with the number of processors. Using the parallel version, we were able to solve the (21, 8) challenging instance using 8 processors in 20.42 hours instead of 6.68 days of the serial version. CONCLUSIONS: Our method speeds up the solution of the exact motif problem. Our method is generic, because it can accommodate any new faster algorithm based on traditional methods. We expect that our method will help to discover longer motifs. The software we developed is available for free for academic research at http://www.nubios.nileu.edu.eg/tools/hymotif.
format Online
Article
Text
id pubmed-3521218
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35212182012-12-14 A hybrid method for the exact planted (l, d) motif finding problem and its parallelization Abbas, Mostafa M Abouelhoda, Mohamed Bahig, Hazem M BMC Bioinformatics Proceedings BACKGROUND: Given a set of DNA sequences s(1), ..., s(t), the (l, d) motif problem is to find an l-length motif sequence M , not necessary existing in any of the input sequences, such that for each sequence s(i), 1 ≤ i ≤ t, there is at least one subsequence differing with at most d mismatches from M. Many exact algorithms have been developed to solve the motif finding problem in the last three decades. However, the problem is still challenging and its solution is limited to small values of l and d. RESULTS: In this paper we present a new efficient method to improve the performance of the exact algorithms for the motif finding problem. Our method is composed of two main steps: First, we process q ≤ t sequences to find candidate motifs. Second, the candidate motifs are searched in the remaining sequences. For both steps, we use the best available algorithms. Our method is a hybrid one, because it integrates currently existing algorithms to achieve the best running time. In this paper, we show how the optimal value of q is determined to achieve the best running time. Our experimental results show that there is about 24% speed-up achieved by our method compared to the best existing algorithm. Furthermore, we also present a parallel version of our method running on shared memory architecture. Our experiments show that the performance of our algorithm scales linearly with the number of processors. Using the parallel version, we were able to solve the (21, 8) challenging instance using 8 processors in 20.42 hours instead of 6.68 days of the serial version. CONCLUSIONS: Our method speeds up the solution of the exact motif problem. Our method is generic, because it can accommodate any new faster algorithm based on traditional methods. We expect that our method will help to discover longer motifs. The software we developed is available for free for academic research at http://www.nubios.nileu.edu.eg/tools/hymotif. BioMed Central 2012-12-07 /pmc/articles/PMC3521218/ /pubmed/23281969 http://dx.doi.org/10.1186/1471-2105-13-S17-S10 Text en Copyright ©2012 Abbas et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Abbas, Mostafa M
Abouelhoda, Mohamed
Bahig, Hazem M
A hybrid method for the exact planted (l, d) motif finding problem and its parallelization
title A hybrid method for the exact planted (l, d) motif finding problem and its parallelization
title_full A hybrid method for the exact planted (l, d) motif finding problem and its parallelization
title_fullStr A hybrid method for the exact planted (l, d) motif finding problem and its parallelization
title_full_unstemmed A hybrid method for the exact planted (l, d) motif finding problem and its parallelization
title_short A hybrid method for the exact planted (l, d) motif finding problem and its parallelization
title_sort hybrid method for the exact planted (l, d) motif finding problem and its parallelization
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3521218/
https://www.ncbi.nlm.nih.gov/pubmed/23281969
http://dx.doi.org/10.1186/1471-2105-13-S17-S10
work_keys_str_mv AT abbasmostafam ahybridmethodfortheexactplantedldmotiffindingproblemanditsparallelization
AT abouelhodamohamed ahybridmethodfortheexactplantedldmotiffindingproblemanditsparallelization
AT bahighazemm ahybridmethodfortheexactplantedldmotiffindingproblemanditsparallelization
AT abbasmostafam hybridmethodfortheexactplantedldmotiffindingproblemanditsparallelization
AT abouelhodamohamed hybridmethodfortheexactplantedldmotiffindingproblemanditsparallelization
AT bahighazemm hybridmethodfortheexactplantedldmotiffindingproblemanditsparallelization