Cargando…
Novel algorithms for LDD motif search
BACKGROUND: Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity between families of proteins, etc. Several motif models have been proposed in the literature. The (l,d)-mot...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6551235/ https://www.ncbi.nlm.nih.gov/pubmed/31167665 http://dx.doi.org/10.1186/s12864-019-5701-6 |
_version_ | 1783424361001123840 |
---|---|
author | Xiao, Peng Schiller, Martin Rajasekaran, Sanguthevar |
author_facet | Xiao, Peng Schiller, Martin Rajasekaran, Sanguthevar |
author_sort | Xiao, Peng |
collection | PubMed |
description | BACKGROUND: Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity between families of proteins, etc. Several motif models have been proposed in the literature. The (l,d)-motif model is one of these that has been studied widely. However, this model will sometimes report too many spurious motifs than expected. We interpret a motif as a biologically significant entity that is evolutionarily preserved within some distance. It may be highly improbable that the motif undergoes the same number of changes in each of the species. To address this issue, in this paper, we introduce a new model which is more general than (l,d)-motif model. This model is called (l,d(1),d(2))-motif model (LDDMS) and is NP-hard as well. We present three elegant as well as efficient algorithms to solve the LDDMS problem, i.e., LDDMS1, LDDMS2 and LDDMS3. They are all exact algorithms. RESULTS: We did both theoretical analyses and empirical tests on these algorithms. Theoretical analyses demonstrate that our algorithms have less computational cost than the pattern driven approach. Empirical results on both simulated datasets and real datasets show that each of the three algorithms has some advantages on some (l,d(1),d(2)) instances. CONCLUSIONS: We proposed LDDMS model which is more practically relevant. We also proposed three exact efficient algorithms to solve the problem. Besides, our algorithms can be nicely parallelized. We believe that the idea in this new model can also be extended to other motif search problems such as Edit-distance-based Motif Search (EMS) and Simple Motif Search (SMS). |
format | Online Article Text |
id | pubmed-6551235 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-65512352019-06-07 Novel algorithms for LDD motif search Xiao, Peng Schiller, Martin Rajasekaran, Sanguthevar BMC Genomics Research BACKGROUND: Motifs are crucial patterns that have numerous applications including the identification of transcription factors and their binding sites, composite regulatory patterns, similarity between families of proteins, etc. Several motif models have been proposed in the literature. The (l,d)-motif model is one of these that has been studied widely. However, this model will sometimes report too many spurious motifs than expected. We interpret a motif as a biologically significant entity that is evolutionarily preserved within some distance. It may be highly improbable that the motif undergoes the same number of changes in each of the species. To address this issue, in this paper, we introduce a new model which is more general than (l,d)-motif model. This model is called (l,d(1),d(2))-motif model (LDDMS) and is NP-hard as well. We present three elegant as well as efficient algorithms to solve the LDDMS problem, i.e., LDDMS1, LDDMS2 and LDDMS3. They are all exact algorithms. RESULTS: We did both theoretical analyses and empirical tests on these algorithms. Theoretical analyses demonstrate that our algorithms have less computational cost than the pattern driven approach. Empirical results on both simulated datasets and real datasets show that each of the three algorithms has some advantages on some (l,d(1),d(2)) instances. CONCLUSIONS: We proposed LDDMS model which is more practically relevant. We also proposed three exact efficient algorithms to solve the problem. Besides, our algorithms can be nicely parallelized. We believe that the idea in this new model can also be extended to other motif search problems such as Edit-distance-based Motif Search (EMS) and Simple Motif Search (SMS). BioMed Central 2019-06-06 /pmc/articles/PMC6551235/ /pubmed/31167665 http://dx.doi.org/10.1186/s12864-019-5701-6 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Xiao, Peng Schiller, Martin Rajasekaran, Sanguthevar Novel algorithms for LDD motif search |
title | Novel algorithms for LDD motif search |
title_full | Novel algorithms for LDD motif search |
title_fullStr | Novel algorithms for LDD motif search |
title_full_unstemmed | Novel algorithms for LDD motif search |
title_short | Novel algorithms for LDD motif search |
title_sort | novel algorithms for ldd motif search |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6551235/ https://www.ncbi.nlm.nih.gov/pubmed/31167665 http://dx.doi.org/10.1186/s12864-019-5701-6 |
work_keys_str_mv | AT xiaopeng novelalgorithmsforlddmotifsearch AT schillermartin novelalgorithmsforlddmotifsearch AT rajasekaransanguthevar novelalgorithmsforlddmotifsearch |