Cargando…
Unsupervised statistical discovery of spaced motifs in prokaryotic genomes
BACKGROUND: DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5217627/ https://www.ncbi.nlm.nih.gov/pubmed/28056763 http://dx.doi.org/10.1186/s12864-016-3400-0 |
_version_ | 1782492145402249216 |
---|---|
author | Tong, Hao Schliekelman, Paul Mrázek, Jan |
author_facet | Tong, Hao Schliekelman, Paul Mrázek, Jan |
author_sort | Tong, Hao |
collection | PubMed |
description | BACKGROUND: DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites. Most motif-finding methods apply probabilistic models to detect motifs characterized by unusually high number of copies of the motif in the analyzed sequences. RESULTS: We present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. In the present implementation of the method, motifs are represented by pentamers and all pairs of pentamers are evaluated for statistically significant preference for a specific distance. An important step of the algorithm eliminates motif pairs where the spacers separating the two motifs exhibit a high degree of sequence similarity; such motif pairs likely arise from duplications of the whole segment including the motifs and the spacer rather than due to selective constraints indicative of a functional importance of the motif pair. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. Some motifs detected were previously known but other motifs found in the search appear to be novel. Selected motif pairs were subjected to further investigation and in some cases their possible biological functions were proposed. CONCLUSIONS: We present a new motif-finding technique that is applicable to scanning complete genomes for sequence motifs. The results from analysis of 569 genomes suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. We conclude that our approach to detection of significant motif pairs can complement existing motif-finding techniques in discovery of novel functional sequence motifs in complete genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3400-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5217627 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-52176272017-01-09 Unsupervised statistical discovery of spaced motifs in prokaryotic genomes Tong, Hao Schliekelman, Paul Mrázek, Jan BMC Genomics Research Article BACKGROUND: DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites. Most motif-finding methods apply probabilistic models to detect motifs characterized by unusually high number of copies of the motif in the analyzed sequences. RESULTS: We present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. In the present implementation of the method, motifs are represented by pentamers and all pairs of pentamers are evaluated for statistically significant preference for a specific distance. An important step of the algorithm eliminates motif pairs where the spacers separating the two motifs exhibit a high degree of sequence similarity; such motif pairs likely arise from duplications of the whole segment including the motifs and the spacer rather than due to selective constraints indicative of a functional importance of the motif pair. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. Some motifs detected were previously known but other motifs found in the search appear to be novel. Selected motif pairs were subjected to further investigation and in some cases their possible biological functions were proposed. CONCLUSIONS: We present a new motif-finding technique that is applicable to scanning complete genomes for sequence motifs. The results from analysis of 569 genomes suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. We conclude that our approach to detection of significant motif pairs can complement existing motif-finding techniques in discovery of novel functional sequence motifs in complete genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3400-0) contains supplementary material, which is available to authorized users. BioMed Central 2017-01-05 /pmc/articles/PMC5217627/ /pubmed/28056763 http://dx.doi.org/10.1186/s12864-016-3400-0 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Tong, Hao Schliekelman, Paul Mrázek, Jan Unsupervised statistical discovery of spaced motifs in prokaryotic genomes |
title | Unsupervised statistical discovery of spaced motifs in prokaryotic genomes |
title_full | Unsupervised statistical discovery of spaced motifs in prokaryotic genomes |
title_fullStr | Unsupervised statistical discovery of spaced motifs in prokaryotic genomes |
title_full_unstemmed | Unsupervised statistical discovery of spaced motifs in prokaryotic genomes |
title_short | Unsupervised statistical discovery of spaced motifs in prokaryotic genomes |
title_sort | unsupervised statistical discovery of spaced motifs in prokaryotic genomes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5217627/ https://www.ncbi.nlm.nih.gov/pubmed/28056763 http://dx.doi.org/10.1186/s12864-016-3400-0 |
work_keys_str_mv | AT tonghao unsupervisedstatisticaldiscoveryofspacedmotifsinprokaryoticgenomes AT schliekelmanpaul unsupervisedstatisticaldiscoveryofspacedmotifsinprokaryoticgenomes AT mrazekjan unsupervisedstatisticaldiscoveryofspacedmotifsinprokaryoticgenomes |