Cargando…

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes

BACKGROUND: DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites...

Descripción completa

Detalles Bibliográficos
Autores principales: Tong, Hao, Schliekelman, Paul, Mrázek, Jan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5217627/
https://www.ncbi.nlm.nih.gov/pubmed/28056763
http://dx.doi.org/10.1186/s12864-016-3400-0
_version_ 1782492145402249216
author Tong, Hao
Schliekelman, Paul
Mrázek, Jan
author_facet Tong, Hao
Schliekelman, Paul
Mrázek, Jan
author_sort Tong, Hao
collection PubMed
description BACKGROUND: DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites. Most motif-finding methods apply probabilistic models to detect motifs characterized by unusually high number of copies of the motif in the analyzed sequences. RESULTS: We present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. In the present implementation of the method, motifs are represented by pentamers and all pairs of pentamers are evaluated for statistically significant preference for a specific distance. An important step of the algorithm eliminates motif pairs where the spacers separating the two motifs exhibit a high degree of sequence similarity; such motif pairs likely arise from duplications of the whole segment including the motifs and the spacer rather than due to selective constraints indicative of a functional importance of the motif pair. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. Some motifs detected were previously known but other motifs found in the search appear to be novel. Selected motif pairs were subjected to further investigation and in some cases their possible biological functions were proposed. CONCLUSIONS: We present a new motif-finding technique that is applicable to scanning complete genomes for sequence motifs. The results from analysis of 569 genomes suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. We conclude that our approach to detection of significant motif pairs can complement existing motif-finding techniques in discovery of novel functional sequence motifs in complete genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3400-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5217627
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52176272017-01-09 Unsupervised statistical discovery of spaced motifs in prokaryotic genomes Tong, Hao Schliekelman, Paul Mrázek, Jan BMC Genomics Research Article BACKGROUND: DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites. Most motif-finding methods apply probabilistic models to detect motifs characterized by unusually high number of copies of the motif in the analyzed sequences. RESULTS: We present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. In the present implementation of the method, motifs are represented by pentamers and all pairs of pentamers are evaluated for statistically significant preference for a specific distance. An important step of the algorithm eliminates motif pairs where the spacers separating the two motifs exhibit a high degree of sequence similarity; such motif pairs likely arise from duplications of the whole segment including the motifs and the spacer rather than due to selective constraints indicative of a functional importance of the motif pair. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. Some motifs detected were previously known but other motifs found in the search appear to be novel. Selected motif pairs were subjected to further investigation and in some cases their possible biological functions were proposed. CONCLUSIONS: We present a new motif-finding technique that is applicable to scanning complete genomes for sequence motifs. The results from analysis of 569 genomes suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. We conclude that our approach to detection of significant motif pairs can complement existing motif-finding techniques in discovery of novel functional sequence motifs in complete genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3400-0) contains supplementary material, which is available to authorized users. BioMed Central 2017-01-05 /pmc/articles/PMC5217627/ /pubmed/28056763 http://dx.doi.org/10.1186/s12864-016-3400-0 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Tong, Hao
Schliekelman, Paul
Mrázek, Jan
Unsupervised statistical discovery of spaced motifs in prokaryotic genomes
title Unsupervised statistical discovery of spaced motifs in prokaryotic genomes
title_full Unsupervised statistical discovery of spaced motifs in prokaryotic genomes
title_fullStr Unsupervised statistical discovery of spaced motifs in prokaryotic genomes
title_full_unstemmed Unsupervised statistical discovery of spaced motifs in prokaryotic genomes
title_short Unsupervised statistical discovery of spaced motifs in prokaryotic genomes
title_sort unsupervised statistical discovery of spaced motifs in prokaryotic genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5217627/
https://www.ncbi.nlm.nih.gov/pubmed/28056763
http://dx.doi.org/10.1186/s12864-016-3400-0
work_keys_str_mv AT tonghao unsupervisedstatisticaldiscoveryofspacedmotifsinprokaryoticgenomes
AT schliekelmanpaul unsupervisedstatisticaldiscoveryofspacedmotifsinprokaryoticgenomes
AT mrazekjan unsupervisedstatisticaldiscoveryofspacedmotifsinprokaryoticgenomes