Cargando…
qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences
Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been p...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3404135/ https://www.ncbi.nlm.nih.gov/pubmed/22848493 http://dx.doi.org/10.1371/journal.pone.0041425 |
_version_ | 1782238994718785536 |
---|---|
author | Dinh, Hieu Rajasekaran, Sanguthevar Davila, Jaime |
author_facet | Dinh, Hieu Rajasekaran, Sanguthevar Davila, Jaime |
author_sort | Dinh, Hieu |
collection | PubMed |
description | Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the [Image: see text] -motif search (or Planted Motif Search (PMS)). A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS), is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com. |
format | Online Article Text |
id | pubmed-3404135 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-34041352012-07-30 qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences Dinh, Hieu Rajasekaran, Sanguthevar Davila, Jaime PLoS One Research Article Detection of rare events happening in a set of DNA/protein sequences could lead to new biological discoveries. One kind of such rare events is the presence of patterns called motifs in DNA/protein sequences. Finding motifs is a challenging problem since the general version of motif search has been proven to be intractable. Motifs discovery is an important problem in biology. For example, it is useful in the detection of transcription factor binding sites and transcriptional regulatory elements that are very crucial in understanding gene function, human disease, drug design, etc. Many versions of the motif search problem have been proposed in the literature. One such is the [Image: see text] -motif search (or Planted Motif Search (PMS)). A generalized version of the PMS problem, namely, Quorum Planted Motif Search (qPMS), is shown to accurately model motifs in real data. However, solving the qPMS problem is an extremely difficult task because a special case of it, the PMS Problem, is already NP-hard, which means that any algorithm solving it can be expected to take exponential time in the worse case scenario. In this paper, we propose a novel algorithm named qPMS7 that tackles the qPMS problem on real data as well as challenging instances. Experimental results show that our Algorithm qPMS7 is on an average 5 times faster than the state-of-art algorithm. The executable program of Algorithm qPMS7 is freely available on the web at http://pms.engr.uconn.edu/downloads/qPMS7.zip. Our online motif discovery tools that use Algorithm qPMS7 are freely available at http://pms.engr.uconn.edu or http://motifsearch.com. Public Library of Science 2012-07-24 /pmc/articles/PMC3404135/ /pubmed/22848493 http://dx.doi.org/10.1371/journal.pone.0041425 Text en Dinh et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Dinh, Hieu Rajasekaran, Sanguthevar Davila, Jaime qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences |
title | qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences |
title_full | qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences |
title_fullStr | qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences |
title_full_unstemmed | qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences |
title_short | qPMS7: A Fast Algorithm for Finding (ℓ, d)-Motifs in DNA and Protein Sequences |
title_sort | qpms7: a fast algorithm for finding (ℓ, d)-motifs in dna and protein sequences |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3404135/ https://www.ncbi.nlm.nih.gov/pubmed/22848493 http://dx.doi.org/10.1371/journal.pone.0041425 |
work_keys_str_mv | AT dinhhieu qpms7afastalgorithmforfindingldmotifsindnaandproteinsequences AT rajasekaransanguthevar qpms7afastalgorithmforfindingldmotifsindnaandproteinsequences AT davilajaime qpms7afastalgorithmforfindingldmotifsindnaandproteinsequences |