Cargando…

Novel algorithms for LDD motif search

BACKGROUND: Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity between families of proteins, etc. Several motif models have been proposed in the literature. The (l,d)-mot...

Descripción completa

Detalles Bibliográficos
Autores principales: Xiao, Peng, Schiller, Martin, Rajasekaran, Sanguthevar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6551235/
https://www.ncbi.nlm.nih.gov/pubmed/31167665
http://dx.doi.org/10.1186/s12864-019-5701-6
_version_ 1783424361001123840
author Xiao, Peng
Schiller, Martin
Rajasekaran, Sanguthevar
author_facet Xiao, Peng
Schiller, Martin
Rajasekaran, Sanguthevar
author_sort Xiao, Peng
collection PubMed
description BACKGROUND: Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity between families of proteins, etc. Several motif models have been proposed in the literature. The (l,d)-motif model is one of these that has been studied widely. However, this model will sometimes report too many spurious motifs than expected. We interpret a motif as a biologically significant entity that is evolutionarily preserved within some distance. It may be highly improbable that the motif undergoes the same number of changes in each of the species. To address this issue, in this paper, we introduce a new model which is more general than (l,d)-motif model. This model is called (l,d(1),d(2))-motif model (LDDMS) and is NP-hard as well. We present three elegant as well as efficient algorithms to solve the LDDMS problem, i.e., LDDMS1, LDDMS2 and LDDMS3. They are all exact algorithms. RESULTS: We did both theoretical analyses and empirical tests on these algorithms. Theoretical analyses demonstrate that our algorithms have less computational cost than the pattern driven approach. Empirical results on both simulated datasets and real datasets show that each of the three algorithms has some advantages on some (l,d(1),d(2)) instances. CONCLUSIONS: We proposed LDDMS model which is more practically relevant. We also proposed three exact efficient algorithms to solve the problem. Besides, our algorithms can be nicely parallelized. We believe that the idea in this new model can also be extended to other motif search problems such as Edit-distance-based Motif Search (EMS) and Simple Motif Search (SMS).
format Online
Article
Text
id pubmed-6551235
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65512352019-06-07 Novel algorithms for LDD motif search Xiao, Peng Schiller, Martin Rajasekaran, Sanguthevar BMC Genomics Research BACKGROUND: Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity between families of proteins, etc. Several motif models have been proposed in the literature. The (l,d)-motif model is one of these that has been studied widely. However, this model will sometimes report too many spurious motifs than expected. We interpret a motif as a biologically significant entity that is evolutionarily preserved within some distance. It may be highly improbable that the motif undergoes the same number of changes in each of the species. To address this issue, in this paper, we introduce a new model which is more general than (l,d)-motif model. This model is called (l,d(1),d(2))-motif model (LDDMS) and is NP-hard as well. We present three elegant as well as efficient algorithms to solve the LDDMS problem, i.e., LDDMS1, LDDMS2 and LDDMS3. They are all exact algorithms. RESULTS: We did both theoretical analyses and empirical tests on these algorithms. Theoretical analyses demonstrate that our algorithms have less computational cost than the pattern driven approach. Empirical results on both simulated datasets and real datasets show that each of the three algorithms has some advantages on some (l,d(1),d(2)) instances. CONCLUSIONS: We proposed LDDMS model which is more practically relevant. We also proposed three exact efficient algorithms to solve the problem. Besides, our algorithms can be nicely parallelized. We believe that the idea in this new model can also be extended to other motif search problems such as Edit-distance-based Motif Search (EMS) and Simple Motif Search (SMS). BioMed Central 2019-06-06 /pmc/articles/PMC6551235/ /pubmed/31167665 http://dx.doi.org/10.1186/s12864-019-5701-6 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Xiao, Peng
Schiller, Martin
Rajasekaran, Sanguthevar
Novel algorithms for LDD motif search
title Novel algorithms for LDD motif search
title_full Novel algorithms for LDD motif search
title_fullStr Novel algorithms for LDD motif search
title_full_unstemmed Novel algorithms for LDD motif search
title_short Novel algorithms for LDD motif search
title_sort novel algorithms for ldd motif search
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6551235/
https://www.ncbi.nlm.nih.gov/pubmed/31167665
http://dx.doi.org/10.1186/s12864-019-5701-6
work_keys_str_mv AT xiaopeng novelalgorithmsforlddmotifsearch
AT schillermartin novelalgorithmsforlddmotifsearch
AT rajasekaransanguthevar novelalgorithmsforlddmotifsearch