Cargando…

Modeling of the GC content of the substituted bases in bacterial core genomes

BACKGROUND: The purpose of the present study was to examine the GC content of substituted bases (sbGC) in the core genomes of 35 bacterial species. Each species, or core genome, constituted genomes from at least 10 strains. We also wanted to explore whether sbGC for each strain was associated with t...

Descripción completa

Detalles Bibliográficos
Autores principales: Bohlin, Jon, Eldholm, Vegard, Brynildsrud, Ola, Petterson, John H.-O., Alfsnes, Kristian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6080486/
https://www.ncbi.nlm.nih.gov/pubmed/30081825
http://dx.doi.org/10.1186/s12864-018-4984-3
_version_ 1783345486537687040
author Bohlin, Jon
Eldholm, Vegard
Brynildsrud, Ola
Petterson, John H.-O.
Alfsnes, Kristian
author_facet Bohlin, Jon
Eldholm, Vegard
Brynildsrud, Ola
Petterson, John H.-O.
Alfsnes, Kristian
author_sort Bohlin, Jon
collection PubMed
description BACKGROUND: The purpose of the present study was to examine the GC content of substituted bases (sbGC) in the core genomes of 35 bacterial species. Each species, or core genome, constituted genomes from at least 10 strains. We also wanted to explore whether sbGC for each strain was associated with the corresponding species’ core genome GC content (cgGC). We present a simple mathematical model that estimates sbGC from cgGC. The model assumes only that the estimated sbGC is a function of cgGC proportional to fixed AT→GC (α) and GC → AT (β) mutation rates. Non-linear regression was used to estimate parameters α and β from the empirical data described above. RESULTS: We found that sbGC for each strain showed a non-linear association with the corresponding cgGC with a bias towards higher GC content for most core genomes (66.3% of the strains), assuming as a null-hypothesis that sbGC should be approximately equal to cgGC. The most GC rich core genomes (i.e. approximately %GC > 60), on the other hand, exhibited slightly less GC-biased sbGC than expected. The best fitted regression model indicates that GC → AT mutation rates β = (1.91 ± 0.13) p < 0.001 are approximately (1.91/0.79) = 2.42 times as high, on average, as AT→GC α = (− 0.79 ± 0.25) p < 0.001 mutation rates. Whether the observed sbGC GC-bias for all but the most GC-rich prokaryotic species is due to selection, compensating for the GC → AT mutation bias, and/or selective neutral processes is currently debated. Residual standard error was found to be σ = 0.076 indicating estimated errors of sbGC to be approximately within ±15.2% GC (95% confidence interval) for the strains of all species in the study. CONCLUSION: Not only did our mathematical model give reasonable estimates of sbGC it also provides further support to previous observations that mutation rates in prokaryotes exhibit a universal GC → AT bias that appears to be remarkably consistent between taxa. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4984-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6080486
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60804862018-08-09 Modeling of the GC content of the substituted bases in bacterial core genomes Bohlin, Jon Eldholm, Vegard Brynildsrud, Ola Petterson, John H.-O. Alfsnes, Kristian BMC Genomics Research Article BACKGROUND: The purpose of the present study was to examine the GC content of substituted bases (sbGC) in the core genomes of 35 bacterial species. Each species, or core genome, constituted genomes from at least 10 strains. We also wanted to explore whether sbGC for each strain was associated with the corresponding species’ core genome GC content (cgGC). We present a simple mathematical model that estimates sbGC from cgGC. The model assumes only that the estimated sbGC is a function of cgGC proportional to fixed AT→GC (α) and GC → AT (β) mutation rates. Non-linear regression was used to estimate parameters α and β from the empirical data described above. RESULTS: We found that sbGC for each strain showed a non-linear association with the corresponding cgGC with a bias towards higher GC content for most core genomes (66.3% of the strains), assuming as a null-hypothesis that sbGC should be approximately equal to cgGC. The most GC rich core genomes (i.e. approximately %GC > 60), on the other hand, exhibited slightly less GC-biased sbGC than expected. The best fitted regression model indicates that GC → AT mutation rates β = (1.91 ± 0.13) p < 0.001 are approximately (1.91/0.79) = 2.42 times as high, on average, as AT→GC α = (− 0.79 ± 0.25) p < 0.001 mutation rates. Whether the observed sbGC GC-bias for all but the most GC-rich prokaryotic species is due to selection, compensating for the GC → AT mutation bias, and/or selective neutral processes is currently debated. Residual standard error was found to be σ = 0.076 indicating estimated errors of sbGC to be approximately within ±15.2% GC (95% confidence interval) for the strains of all species in the study. CONCLUSION: Not only did our mathematical model give reasonable estimates of sbGC it also provides further support to previous observations that mutation rates in prokaryotes exhibit a universal GC → AT bias that appears to be remarkably consistent between taxa. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-018-4984-3) contains supplementary material, which is available to authorized users. BioMed Central 2018-08-06 /pmc/articles/PMC6080486/ /pubmed/30081825 http://dx.doi.org/10.1186/s12864-018-4984-3 Text en © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Bohlin, Jon
Eldholm, Vegard
Brynildsrud, Ola
Petterson, John H.-O.
Alfsnes, Kristian
Modeling of the GC content of the substituted bases in bacterial core genomes
title Modeling of the GC content of the substituted bases in bacterial core genomes
title_full Modeling of the GC content of the substituted bases in bacterial core genomes
title_fullStr Modeling of the GC content of the substituted bases in bacterial core genomes
title_full_unstemmed Modeling of the GC content of the substituted bases in bacterial core genomes
title_short Modeling of the GC content of the substituted bases in bacterial core genomes
title_sort modeling of the gc content of the substituted bases in bacterial core genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6080486/
https://www.ncbi.nlm.nih.gov/pubmed/30081825
http://dx.doi.org/10.1186/s12864-018-4984-3
work_keys_str_mv AT bohlinjon modelingofthegccontentofthesubstitutedbasesinbacterialcoregenomes
AT eldholmvegard modelingofthegccontentofthesubstitutedbasesinbacterialcoregenomes
AT brynildsrudola modelingofthegccontentofthesubstitutedbasesinbacterialcoregenomes
AT pettersonjohnho modelingofthegccontentofthesubstitutedbasesinbacterialcoregenomes
AT alfsneskristian modelingofthegccontentofthesubstitutedbasesinbacterialcoregenomes