Cargando…

VBCG: 20 validated bacterial core genes for phylogenomic analysis with high fidelity and resolution

BACKGROUND: Phylogenomic analysis has become an inseparable part of studies of bacterial diversity and evolution, and many different bacterial core genes have been collated and used for phylogenomic tree reconstruction. However, these genes have been selected based on their presence and single-copy...

Descripción completa

Detalles Bibliográficos
Autores principales: Tian, Renmao, Imanian, Behzad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10631056/
https://www.ncbi.nlm.nih.gov/pubmed/37936197
http://dx.doi.org/10.1186/s40168-023-01705-9
_version_ 1785132287675858944
author Tian, Renmao
Imanian, Behzad
author_facet Tian, Renmao
Imanian, Behzad
author_sort Tian, Renmao
collection PubMed
description BACKGROUND: Phylogenomic analysis has become an inseparable part of studies of bacterial diversity and evolution, and many different bacterial core genes have been collated and used for phylogenomic tree reconstruction. However, these genes have been selected based on their presence and single-copy ratio in all bacterial genomes, leaving out the gene's 'phylogenetic fidelity' unexamined. RESULTS: From 30,522 complete genomes covering 11,262 species, we examined 148 bacterial core genes that have been previously used for phylogenomic analysis. In addition to the gene presence and single-copy rations, we evaluated the gene's phylogenetic fidelity by comparing each gene's phylogeny with its corresponding 16S rRNA gene tree. Out of the 148 bacterial genes, 20 validated bacterial core genes (VBCG) were selected as the core gene set with the highest bacterial phylogenetic fidelity. Compared to the larger gene set, the 20-gene core set resulted in more species having all genes present and fewer species with missing data, thereby enhancing the accuracy of phylogenomic analysis. Using Escherichia coli strains as examples of prominent bacterial foodborne pathogens, we demonstrated that the 20 VBCG produced phylogenies with higher fidelity and resolution at species and strain levels while 16S rRNA gene tree alone could not. CONCLUSION: The 20 validated core gene set improves the fidelity and speed of phylogenomic analysis. Among other uses, this tool improves our ability to explore the evolution, typing and tracking of bacterial strains, such as human pathogens. We have developed a Python pipeline and a desktop graphic app (available on GitHub) for users to perform phylogenomic analysis with high fidelity and resolution. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-023-01705-9.
format Online
Article
Text
id pubmed-10631056
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106310562023-11-08 VBCG: 20 validated bacterial core genes for phylogenomic analysis with high fidelity and resolution Tian, Renmao Imanian, Behzad Microbiome Software BACKGROUND: Phylogenomic analysis has become an inseparable part of studies of bacterial diversity and evolution, and many different bacterial core genes have been collated and used for phylogenomic tree reconstruction. However, these genes have been selected based on their presence and single-copy ratio in all bacterial genomes, leaving out the gene's 'phylogenetic fidelity' unexamined. RESULTS: From 30,522 complete genomes covering 11,262 species, we examined 148 bacterial core genes that have been previously used for phylogenomic analysis. In addition to the gene presence and single-copy rations, we evaluated the gene's phylogenetic fidelity by comparing each gene's phylogeny with its corresponding 16S rRNA gene tree. Out of the 148 bacterial genes, 20 validated bacterial core genes (VBCG) were selected as the core gene set with the highest bacterial phylogenetic fidelity. Compared to the larger gene set, the 20-gene core set resulted in more species having all genes present and fewer species with missing data, thereby enhancing the accuracy of phylogenomic analysis. Using Escherichia coli strains as examples of prominent bacterial foodborne pathogens, we demonstrated that the 20 VBCG produced phylogenies with higher fidelity and resolution at species and strain levels while 16S rRNA gene tree alone could not. CONCLUSION: The 20 validated core gene set improves the fidelity and speed of phylogenomic analysis. Among other uses, this tool improves our ability to explore the evolution, typing and tracking of bacterial strains, such as human pathogens. We have developed a Python pipeline and a desktop graphic app (available on GitHub) for users to perform phylogenomic analysis with high fidelity and resolution. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s40168-023-01705-9. BioMed Central 2023-11-08 /pmc/articles/PMC10631056/ /pubmed/37936197 http://dx.doi.org/10.1186/s40168-023-01705-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Tian, Renmao
Imanian, Behzad
VBCG: 20 validated bacterial core genes for phylogenomic analysis with high fidelity and resolution
title VBCG: 20 validated bacterial core genes for phylogenomic analysis with high fidelity and resolution
title_full VBCG: 20 validated bacterial core genes for phylogenomic analysis with high fidelity and resolution
title_fullStr VBCG: 20 validated bacterial core genes for phylogenomic analysis with high fidelity and resolution
title_full_unstemmed VBCG: 20 validated bacterial core genes for phylogenomic analysis with high fidelity and resolution
title_short VBCG: 20 validated bacterial core genes for phylogenomic analysis with high fidelity and resolution
title_sort vbcg: 20 validated bacterial core genes for phylogenomic analysis with high fidelity and resolution
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10631056/
https://www.ncbi.nlm.nih.gov/pubmed/37936197
http://dx.doi.org/10.1186/s40168-023-01705-9
work_keys_str_mv AT tianrenmao vbcg20validatedbacterialcoregenesforphylogenomicanalysiswithhighfidelityandresolution
AT imanianbehzad vbcg20validatedbacterialcoregenesforphylogenomicanalysiswithhighfidelityandresolution