Cargando…

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes

BACKGROUND: DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tong, Hao, Schliekelman, Paul, Mrázek, Jan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5217627/ https://www.ncbi.nlm.nih.gov/pubmed/28056763 http://dx.doi.org/10.1186/s12864-016-3400-0

_version_	1782492145402249216
author	Tong, Hao Schliekelman, Paul Mrázek, Jan
author_facet	Tong, Hao Schliekelman, Paul Mrázek, Jan
author_sort	Tong, Hao
collection	PubMed
description	BACKGROUND: DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites. Most motif-finding methods apply probabilistic models to detect motifs characterized by unusually high number of copies of the motif in the analyzed sequences. RESULTS: We present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. In the present implementation of the method, motifs are represented by pentamers and all pairs of pentamers are evaluated for statistically significant preference for a specific distance. An important step of the algorithm eliminates motif pairs where the spacers separating the two motifs exhibit a high degree of sequence similarity; such motif pairs likely arise from duplications of the whole segment including the motifs and the spacer rather than due to selective constraints indicative of a functional importance of the motif pair. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. Some motifs detected were previously known but other motifs found in the search appear to be novel. Selected motif pairs were subjected to further investigation and in some cases their possible biological functions were proposed. CONCLUSIONS: We present a new motif-finding technique that is applicable to scanning complete genomes for sequence motifs. The results from analysis of 569 genomes suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. We conclude that our approach to detection of significant motif pairs can complement existing motif-finding techniques in discovery of novel functional sequence motifs in complete genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3400-0) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5217627
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-52176272017-01-09 Unsupervised statistical discovery of spaced motifs in prokaryotic genomes Tong, Hao Schliekelman, Paul Mrázek, Jan BMC Genomics Research Article BACKGROUND: DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites. Most motif-finding methods apply probabilistic models to detect motifs characterized by unusually high number of copies of the motif in the analyzed sequences. RESULTS: We present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. In the present implementation of the method, motifs are represented by pentamers and all pairs of pentamers are evaluated for statistically significant preference for a specific distance. An important step of the algorithm eliminates motif pairs where the spacers separating the two motifs exhibit a high degree of sequence similarity; such motif pairs likely arise from duplications of the whole segment including the motifs and the spacer rather than due to selective constraints indicative of a functional importance of the motif pair. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. Some motifs detected were previously known but other motifs found in the search appear to be novel. Selected motif pairs were subjected to further investigation and in some cases their possible biological functions were proposed. CONCLUSIONS: We present a new motif-finding technique that is applicable to scanning complete genomes for sequence motifs. The results from analysis of 569 genomes suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. We conclude that our approach to detection of significant motif pairs can complement existing motif-finding techniques in discovery of novel functional sequence motifs in complete genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3400-0) contains supplementary material, which is available to authorized users. BioMed Central 2017-01-05 /pmc/articles/PMC5217627/ /pubmed/28056763 http://dx.doi.org/10.1186/s12864-016-3400-0 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Tong, Hao Schliekelman, Paul Mrázek, Jan Unsupervised statistical discovery of spaced motifs in prokaryotic genomes
title	Unsupervised statistical discovery of spaced motifs in prokaryotic genomes
title_full	Unsupervised statistical discovery of spaced motifs in prokaryotic genomes
title_fullStr	Unsupervised statistical discovery of spaced motifs in prokaryotic genomes
title_full_unstemmed	Unsupervised statistical discovery of spaced motifs in prokaryotic genomes
title_short	Unsupervised statistical discovery of spaced motifs in prokaryotic genomes
title_sort	unsupervised statistical discovery of spaced motifs in prokaryotic genomes
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5217627/ https://www.ncbi.nlm.nih.gov/pubmed/28056763 http://dx.doi.org/10.1186/s12864-016-3400-0
work_keys_str_mv	AT tonghao unsupervisedstatisticaldiscoveryofspacedmotifsinprokaryoticgenomes AT schliekelmanpaul unsupervisedstatisticaldiscoveryofspacedmotifsinprokaryoticgenomes AT mrazekjan unsupervisedstatisticaldiscoveryofspacedmotifsinprokaryoticgenomes

Unsupervised statistical discovery of spaced motifs in prokaryotic genomes

Ejemplares similares