Cargando…

qPMS9: An Efficient Algorithm for Quorum Planted Motif Search

Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation sig...

Descripción completa

Detalles Bibliográficos
Autores principales: Nicolae, Marius, Rajasekaran, Sanguthevar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4295094/
https://www.ncbi.nlm.nih.gov/pubmed/25589474
http://dx.doi.org/10.1038/srep07813
_version_ 1782352791134535680
author Nicolae, Marius
Rajasekaran, Sanguthevar
author_facet Nicolae, Marius
Rajasekaran, Sanguthevar
author_sort Nicolae, Marius
collection PubMed
description Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have led to domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, discovery of short functional motifs, etc. In this paper we focus on the identification of an important class of patterns, namely, motifs. We study the (ℓ, d) motif search problem or Planted Motif Search (PMS). PMS receives as input n strings and two integers ℓ and d. It returns all sequences M of length ℓ that occur in each input string, where each occurrence differs from M in at most d positions. Another formulation is quorum PMS (qPMS), where the motif appears in at least q% of the strings. We introduce qPMS9, a parallel exact qPMS algorithm that offers significant runtime improvements on DNA and protein datasets. qPMS9 solves the challenging DNA (ℓ, d)-instances (28, 12) and (30, 13). The source code is available at https://code.google.com/p/qpms9/.
format Online
Article
Text
id pubmed-4295094
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-42950942015-01-27 qPMS9: An Efficient Algorithm for Quorum Planted Motif Search Nicolae, Marius Rajasekaran, Sanguthevar Sci Rep Article Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have led to domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, discovery of short functional motifs, etc. In this paper we focus on the identification of an important class of patterns, namely, motifs. We study the (ℓ, d) motif search problem or Planted Motif Search (PMS). PMS receives as input n strings and two integers ℓ and d. It returns all sequences M of length ℓ that occur in each input string, where each occurrence differs from M in at most d positions. Another formulation is quorum PMS (qPMS), where the motif appears in at least q% of the strings. We introduce qPMS9, a parallel exact qPMS algorithm that offers significant runtime improvements on DNA and protein datasets. qPMS9 solves the challenging DNA (ℓ, d)-instances (28, 12) and (30, 13). The source code is available at https://code.google.com/p/qpms9/. Nature Publishing Group 2015-01-15 /pmc/articles/PMC4295094/ /pubmed/25589474 http://dx.doi.org/10.1038/srep07813 Text en Copyright © 2015, Macmillan Publishers Limited. All rights reserved http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Nicolae, Marius
Rajasekaran, Sanguthevar
qPMS9: An Efficient Algorithm for Quorum Planted Motif Search
title qPMS9: An Efficient Algorithm for Quorum Planted Motif Search
title_full qPMS9: An Efficient Algorithm for Quorum Planted Motif Search
title_fullStr qPMS9: An Efficient Algorithm for Quorum Planted Motif Search
title_full_unstemmed qPMS9: An Efficient Algorithm for Quorum Planted Motif Search
title_short qPMS9: An Efficient Algorithm for Quorum Planted Motif Search
title_sort qpms9: an efficient algorithm for quorum planted motif search
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4295094/
https://www.ncbi.nlm.nih.gov/pubmed/25589474
http://dx.doi.org/10.1038/srep07813
work_keys_str_mv AT nicolaemarius qpms9anefficientalgorithmforquorumplantedmotifsearch
AT rajasekaransanguthevar qpms9anefficientalgorithmforquorumplantedmotifsearch