Cargando…
qPMS9: An Efficient Algorithm for Quorum Planted Motif Search
Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation sig...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4295094/ https://www.ncbi.nlm.nih.gov/pubmed/25589474 http://dx.doi.org/10.1038/srep07813 |
_version_ | 1782352791134535680 |
---|---|
author | Nicolae, Marius Rajasekaran, Sanguthevar |
author_facet | Nicolae, Marius Rajasekaran, Sanguthevar |
author_sort | Nicolae, Marius |
collection | PubMed |
description | Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have led to domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, discovery of short functional motifs, etc. In this paper we focus on the identification of an important class of patterns, namely, motifs. We study the (ℓ, d) motif search problem or Planted Motif Search (PMS). PMS receives as input n strings and two integers ℓ and d. It returns all sequences M of length ℓ that occur in each input string, where each occurrence differs from M in at most d positions. Another formulation is quorum PMS (qPMS), where the motif appears in at least q% of the strings. We introduce qPMS9, a parallel exact qPMS algorithm that offers significant runtime improvements on DNA and protein datasets. qPMS9 solves the challenging DNA (ℓ, d)-instances (28, 12) and (30, 13). The source code is available at https://code.google.com/p/qpms9/. |
format | Online Article Text |
id | pubmed-4295094 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-42950942015-01-27 qPMS9: An Efficient Algorithm for Quorum Planted Motif Search Nicolae, Marius Rajasekaran, Sanguthevar Sci Rep Article Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have led to domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, discovery of short functional motifs, etc. In this paper we focus on the identification of an important class of patterns, namely, motifs. We study the (ℓ, d) motif search problem or Planted Motif Search (PMS). PMS receives as input n strings and two integers ℓ and d. It returns all sequences M of length ℓ that occur in each input string, where each occurrence differs from M in at most d positions. Another formulation is quorum PMS (qPMS), where the motif appears in at least q% of the strings. We introduce qPMS9, a parallel exact qPMS algorithm that offers significant runtime improvements on DNA and protein datasets. qPMS9 solves the challenging DNA (ℓ, d)-instances (28, 12) and (30, 13). The source code is available at https://code.google.com/p/qpms9/. Nature Publishing Group 2015-01-15 /pmc/articles/PMC4295094/ /pubmed/25589474 http://dx.doi.org/10.1038/srep07813 Text en Copyright © 2015, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Nicolae, Marius Rajasekaran, Sanguthevar qPMS9: An Efficient Algorithm for Quorum Planted Motif Search |
title | qPMS9: An Efficient Algorithm for Quorum Planted Motif Search |
title_full | qPMS9: An Efficient Algorithm for Quorum Planted Motif Search |
title_fullStr | qPMS9: An Efficient Algorithm for Quorum Planted Motif Search |
title_full_unstemmed | qPMS9: An Efficient Algorithm for Quorum Planted Motif Search |
title_short | qPMS9: An Efficient Algorithm for Quorum Planted Motif Search |
title_sort | qpms9: an efficient algorithm for quorum planted motif search |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4295094/ https://www.ncbi.nlm.nih.gov/pubmed/25589474 http://dx.doi.org/10.1038/srep07813 |
work_keys_str_mv | AT nicolaemarius qpms9anefficientalgorithmforquorumplantedmotifsearch AT rajasekaransanguthevar qpms9anefficientalgorithmforquorumplantedmotifsearch |