Cargando…

Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model

The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing da...

Descripción completa

Detalles Bibliográficos
Autores principales: van Tonder, Andries J., Mistry, Shilan, Bray, James E., Hill, Dorothea M. C., Cody, Alison J., Farmer, Chris L., Klugman, Keith P., von Gottberg, Anne, Bentley, Stephen D., Parkhill, Julian, Jolley, Keith A., Maiden, Martin C. J., Brueggemann, Angela B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4140633/
https://www.ncbi.nlm.nih.gov/pubmed/25144616
http://dx.doi.org/10.1371/journal.pcbi.1003788
_version_ 1782331533126795264
author van Tonder, Andries J.
Mistry, Shilan
Bray, James E.
Hill, Dorothea M. C.
Cody, Alison J.
Farmer, Chris L.
Klugman, Keith P.
von Gottberg, Anne
Bentley, Stephen D.
Parkhill, Julian
Jolley, Keith A.
Maiden, Martin C. J.
Brueggemann, Angela B.
author_facet van Tonder, Andries J.
Mistry, Shilan
Bray, James E.
Hill, Dorothea M. C.
Cody, Alison J.
Farmer, Chris L.
Klugman, Keith P.
von Gottberg, Anne
Bentley, Stephen D.
Parkhill, Julian
Jolley, Keith A.
Maiden, Martin C. J.
Brueggemann, Angela B.
author_sort van Tonder, Andries J.
collection PubMed
description The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance.
format Online
Article
Text
id pubmed-4140633
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41406332014-08-25 Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model van Tonder, Andries J. Mistry, Shilan Bray, James E. Hill, Dorothea M. C. Cody, Alison J. Farmer, Chris L. Klugman, Keith P. von Gottberg, Anne Bentley, Stephen D. Parkhill, Julian Jolley, Keith A. Maiden, Martin C. J. Brueggemann, Angela B. PLoS Comput Biol Research Article The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance. Public Library of Science 2014-08-21 /pmc/articles/PMC4140633/ /pubmed/25144616 http://dx.doi.org/10.1371/journal.pcbi.1003788 Text en © 2014 van Tonder et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
van Tonder, Andries J.
Mistry, Shilan
Bray, James E.
Hill, Dorothea M. C.
Cody, Alison J.
Farmer, Chris L.
Klugman, Keith P.
von Gottberg, Anne
Bentley, Stephen D.
Parkhill, Julian
Jolley, Keith A.
Maiden, Martin C. J.
Brueggemann, Angela B.
Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model
title Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model
title_full Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model
title_fullStr Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model
title_full_unstemmed Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model
title_short Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model
title_sort defining the estimated core genome of bacterial populations using a bayesian decision model
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4140633/
https://www.ncbi.nlm.nih.gov/pubmed/25144616
http://dx.doi.org/10.1371/journal.pcbi.1003788
work_keys_str_mv AT vantonderandriesj definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT mistryshilan definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT brayjamese definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT hilldorotheamc definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT codyalisonj definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT farmerchrisl definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT klugmankeithp definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT vongottberganne definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT bentleystephend definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT parkhilljulian definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT jolleykeitha definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT maidenmartincj definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel
AT brueggemannangelab definingtheestimatedcoregenomeofbacterialpopulationsusingabayesiandecisionmodel