Cargando…

A Markovian analysis of bacterial genome sequence constraints

The arrangement of nucleotides within a bacterial chromosome is influenced by numerous factors. The degeneracy of the third codon within each reading frame allows some flexibility of nucleotide selection; however, the third nucleotide in the triplet of each codon is at least partly determined by the...

Descripción completa

Detalles Bibliográficos
Autores principales: Skewes, Aaron D., Welch, Roy D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3757466/
https://www.ncbi.nlm.nih.gov/pubmed/24010012
http://dx.doi.org/10.7717/peerj.127
_version_ 1782282222205665280
author Skewes, Aaron D.
Welch, Roy D.
author_facet Skewes, Aaron D.
Welch, Roy D.
author_sort Skewes, Aaron D.
collection PubMed
description The arrangement of nucleotides within a bacterial chromosome is influenced by numerous factors. The degeneracy of the third codon within each reading frame allows some flexibility of nucleotide selection; however, the third nucleotide in the triplet of each codon is at least partly determined by the preceding two. This is most evident in organisms with a strong G + C bias, as the degenerate codon must contribute disproportionately to maintaining that bias. Therefore, a correlation exists between the first two nucleotides and the third in all open reading frames. If the arrangement of nucleotides in a bacterial chromosome is represented as a Markov process, we would expect that the correlation would be completely captured by a second-order Markov model and an increase in the order of the model (e.g., third-, fourth-…order) would not capture any additional uncertainty in the process. In this manuscript, we present the results of a comprehensive study of the Markov property that exists in the DNA sequences of 906 bacterial chromosomes. All of the 906 bacterial chromosomes studied exhibit a statistically significant Markov property that extends beyond second-order, and therefore cannot be fully explained by codon usage. An unrooted tree containing all 906 bacterial chromosomes based on their transition probability matrices of third-order shares ∼25% similarity to a tree based on sequence homologies of 16S rRNA sequences. This congruence to the 16S rRNA tree is greater than for trees based on lower-order models (e.g., second-order), and higher-order models result in diminishing improvements in congruence. A nucleotide correlation most likely exists within every bacterial chromosome that extends past three nucleotides. This correlation places significant limits on the number of nucleotide sequences that can represent probable bacterial chromosomes. Transition matrix usage is largely conserved by taxa, indicating that this property is likely inherited, however some important exceptions exist that may indicate the convergent evolution of some bacteria.
format Online
Article
Text
id pubmed-3757466
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-37574662013-09-04 A Markovian analysis of bacterial genome sequence constraints Skewes, Aaron D. Welch, Roy D. PeerJ Bioinformatics The arrangement of nucleotides within a bacterial chromosome is influenced by numerous factors. The degeneracy of the third codon within each reading frame allows some flexibility of nucleotide selection; however, the third nucleotide in the triplet of each codon is at least partly determined by the preceding two. This is most evident in organisms with a strong G + C bias, as the degenerate codon must contribute disproportionately to maintaining that bias. Therefore, a correlation exists between the first two nucleotides and the third in all open reading frames. If the arrangement of nucleotides in a bacterial chromosome is represented as a Markov process, we would expect that the correlation would be completely captured by a second-order Markov model and an increase in the order of the model (e.g., third-, fourth-…order) would not capture any additional uncertainty in the process. In this manuscript, we present the results of a comprehensive study of the Markov property that exists in the DNA sequences of 906 bacterial chromosomes. All of the 906 bacterial chromosomes studied exhibit a statistically significant Markov property that extends beyond second-order, and therefore cannot be fully explained by codon usage. An unrooted tree containing all 906 bacterial chromosomes based on their transition probability matrices of third-order shares ∼25% similarity to a tree based on sequence homologies of 16S rRNA sequences. This congruence to the 16S rRNA tree is greater than for trees based on lower-order models (e.g., second-order), and higher-order models result in diminishing improvements in congruence. A nucleotide correlation most likely exists within every bacterial chromosome that extends past three nucleotides. This correlation places significant limits on the number of nucleotide sequences that can represent probable bacterial chromosomes. Transition matrix usage is largely conserved by taxa, indicating that this property is likely inherited, however some important exceptions exist that may indicate the convergent evolution of some bacteria. PeerJ Inc. 2013-08-29 /pmc/articles/PMC3757466/ /pubmed/24010012 http://dx.doi.org/10.7717/peerj.127 Text en © 2013 Skewes and Welch http://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Bioinformatics
Skewes, Aaron D.
Welch, Roy D.
A Markovian analysis of bacterial genome sequence constraints
title A Markovian analysis of bacterial genome sequence constraints
title_full A Markovian analysis of bacterial genome sequence constraints
title_fullStr A Markovian analysis of bacterial genome sequence constraints
title_full_unstemmed A Markovian analysis of bacterial genome sequence constraints
title_short A Markovian analysis of bacterial genome sequence constraints
title_sort markovian analysis of bacterial genome sequence constraints
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3757466/
https://www.ncbi.nlm.nih.gov/pubmed/24010012
http://dx.doi.org/10.7717/peerj.127
work_keys_str_mv AT skewesaarond amarkoviananalysisofbacterialgenomesequenceconstraints
AT welchroyd amarkoviananalysisofbacterialgenomesequenceconstraints
AT skewesaarond markoviananalysisofbacterialgenomesequenceconstraints
AT welchroyd markoviananalysisofbacterialgenomesequenceconstraints