Cargando…

qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences

Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been p...

Descripción completa

Detalles Bibliográficos
Autores principales: Dinh, Hieu, Rajasekaran, Sanguthevar, Davila, Jaime
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3404135/
https://www.ncbi.nlm.nih.gov/pubmed/22848493
http://dx.doi.org/10.1371/journal.pone.0041425
_version_ 1782238994718785536
author Dinh, Hieu
Rajasekaran, Sanguthevar
Davila, Jaime
author_facet Dinh, Hieu
Rajasekaran, Sanguthevar
Davila, Jaime
author_sort Dinh, Hieu
collection PubMed
description Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the [Image: see text] -motif search (or Planted Motif Search (PMS)). A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS), is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com.
format Online
Article
Text
id pubmed-3404135
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34041352012-07-30 qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences Dinh, Hieu Rajasekaran, Sanguthevar Davila, Jaime PLoS One Research Article Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the [Image: see text] -motif search (or Planted Motif Search (PMS)). A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS), is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com. Public Library of Science 2012-07-24 /pmc/articles/PMC3404135/ /pubmed/22848493 http://dx.doi.org/10.1371/journal.pone.0041425 Text en Dinh et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Dinh, Hieu
Rajasekaran, Sanguthevar
Davila, Jaime
qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences
title qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences
title_full qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences
title_fullStr qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences
title_full_unstemmed qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences
title_short qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences
title_sort qpms7: a fast algorithm for finding (ℓ, d)-motifs in dna and protein sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3404135/
https://www.ncbi.nlm.nih.gov/pubmed/22848493
http://dx.doi.org/10.1371/journal.pone.0041425
work_keys_str_mv AT dinhhieu qpms7afastalgorithmforfindingldmotifsindnaandproteinsequences
AT rajasekaransanguthevar qpms7afastalgorithmforfindingldmotifsindnaandproteinsequences
AT davilajaime qpms7afastalgorithmforfindingldmotifsindnaandproteinsequences