Cargando…

Global genomic similarity and core genome sequence diversity of the Streptococcus genus as a toolkit to identify closely related bacterial species in complex environments

BACKGROUND: The Streptococcus genus is relevant to both public health and food safety because of its ability to cause pathogenic infections. It is well-represented (>100 genomes) in publicly available databases. Streptococci are ubiquitous, with multiple sources of isolation, from human pathogens...

Descripción completa

Detalles Bibliográficos
Autores principales: Barajas, Hugo R., Romero, Miguel F., Martínez-Sánchez, Shamayim, Alcaraz, Luis D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6336011/
https://www.ncbi.nlm.nih.gov/pubmed/30656069
http://dx.doi.org/10.7717/peerj.6233
_version_ 1783387997944676352
author Barajas, Hugo R.
Romero, Miguel F.
Martínez-Sánchez, Shamayim
Alcaraz, Luis D.
author_facet Barajas, Hugo R.
Romero, Miguel F.
Martínez-Sánchez, Shamayim
Alcaraz, Luis D.
author_sort Barajas, Hugo R.
collection PubMed
description BACKGROUND: The Streptococcus genus is relevant to both public health and food safety because of its ability to cause pathogenic infections. It is well-represented (>100 genomes) in publicly available databases. Streptococci are ubiquitous, with multiple sources of isolation, from human pathogens to dairy products. The Streptococcus genus has traditionally been classified by morphology, serum types, the 16S ribosomal RNA (rRNA) gene, and multi-locus sequence types subject to in-depth comparative genomic analysis. METHODS: Core and pan-genomes described the genomic diversity of 108 strains belonging to 16 Streptococcus species. The core genome nucleotide diversity was calculated and compared to phylogenomic distances within the genus Streptococcus. The core genome was also used as a resource to recruit metagenomic fragment reads from streptococci dominated environments. A conventional 16S rRNA gene phylogeny reconstruction was used as a reference to compare the resulting dendrograms of average nucleotide identity (ANI) and genome similarity score (GSS) dendrograms. RESULTS: The core genome, in this work, consists of 404 proteins that are shared by all 108 Streptococcus. The average identity of the pairwise compared core proteins decreases proportionally to GSS lower scores, across species. The GSS dendrogram recovers most of the clades in the 16S rRNA gene phylogeny while distinguishing between 16S polytomies (unresolved nodes). The GSS is a distance metric that can reflect evolutionary history comparing orthologous proteins. Additionally, GSS resulted in the most useful metric for genus and species comparisons, where ANI metrics failed due to false positives when comparing different species. DISCUSSION: Understanding of genomic variability and species relatedness is the goal of tools like GSS, which makes use of the maximum pairwise shared orthologous sequences for its calculation. It allows for long evolutionary distances (above species) to be included because of the use of amino acid alignment scores, rather than nucleotides, and normalizing by positive matches. Newly sequenced species and strains could be easily placed into GSS dendrograms to infer overall genomic relatedness. The GSS is not restricted to ubiquitous conservancy of gene features; thus, it reflects the mosaic-structure and dynamism of gene acquisition and loss in bacterial genomes.
format Online
Article
Text
id pubmed-6336011
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-63360112019-01-17 Global genomic similarity and core genome sequence diversity of the Streptococcus genus as a toolkit to identify closely related bacterial species in complex environments Barajas, Hugo R. Romero, Miguel F. Martínez-Sánchez, Shamayim Alcaraz, Luis D. PeerJ Biodiversity BACKGROUND: The Streptococcus genus is relevant to both public health and food safety because of its ability to cause pathogenic infections. It is well-represented (>100 genomes) in publicly available databases. Streptococci are ubiquitous, with multiple sources of isolation, from human pathogens to dairy products. The Streptococcus genus has traditionally been classified by morphology, serum types, the 16S ribosomal RNA (rRNA) gene, and multi-locus sequence types subject to in-depth comparative genomic analysis. METHODS: Core and pan-genomes described the genomic diversity of 108 strains belonging to 16 Streptococcus species. The core genome nucleotide diversity was calculated and compared to phylogenomic distances within the genus Streptococcus. The core genome was also used as a resource to recruit metagenomic fragment reads from streptococci dominated environments. A conventional 16S rRNA gene phylogeny reconstruction was used as a reference to compare the resulting dendrograms of average nucleotide identity (ANI) and genome similarity score (GSS) dendrograms. RESULTS: The core genome, in this work, consists of 404 proteins that are shared by all 108 Streptococcus. The average identity of the pairwise compared core proteins decreases proportionally to GSS lower scores, across species. The GSS dendrogram recovers most of the clades in the 16S rRNA gene phylogeny while distinguishing between 16S polytomies (unresolved nodes). The GSS is a distance metric that can reflect evolutionary history comparing orthologous proteins. Additionally, GSS resulted in the most useful metric for genus and species comparisons, where ANI metrics failed due to false positives when comparing different species. DISCUSSION: Understanding of genomic variability and species relatedness is the goal of tools like GSS, which makes use of the maximum pairwise shared orthologous sequences for its calculation. It allows for long evolutionary distances (above species) to be included because of the use of amino acid alignment scores, rather than nucleotides, and normalizing by positive matches. Newly sequenced species and strains could be easily placed into GSS dendrograms to infer overall genomic relatedness. The GSS is not restricted to ubiquitous conservancy of gene features; thus, it reflects the mosaic-structure and dynamism of gene acquisition and loss in bacterial genomes. PeerJ Inc. 2019-01-14 /pmc/articles/PMC6336011/ /pubmed/30656069 http://dx.doi.org/10.7717/peerj.6233 Text en © 2019 Barajas et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Biodiversity
Barajas, Hugo R.
Romero, Miguel F.
Martínez-Sánchez, Shamayim
Alcaraz, Luis D.
Global genomic similarity and core genome sequence diversity of the Streptococcus genus as a toolkit to identify closely related bacterial species in complex environments
title Global genomic similarity and core genome sequence diversity of the Streptococcus genus as a toolkit to identify closely related bacterial species in complex environments
title_full Global genomic similarity and core genome sequence diversity of the Streptococcus genus as a toolkit to identify closely related bacterial species in complex environments
title_fullStr Global genomic similarity and core genome sequence diversity of the Streptococcus genus as a toolkit to identify closely related bacterial species in complex environments
title_full_unstemmed Global genomic similarity and core genome sequence diversity of the Streptococcus genus as a toolkit to identify closely related bacterial species in complex environments
title_short Global genomic similarity and core genome sequence diversity of the Streptococcus genus as a toolkit to identify closely related bacterial species in complex environments
title_sort global genomic similarity and core genome sequence diversity of the streptococcus genus as a toolkit to identify closely related bacterial species in complex environments
topic Biodiversity
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6336011/
https://www.ncbi.nlm.nih.gov/pubmed/30656069
http://dx.doi.org/10.7717/peerj.6233
work_keys_str_mv AT barajashugor globalgenomicsimilarityandcoregenomesequencediversityofthestreptococcusgenusasatoolkittoidentifycloselyrelatedbacterialspeciesincomplexenvironments
AT romeromiguelf globalgenomicsimilarityandcoregenomesequencediversityofthestreptococcusgenusasatoolkittoidentifycloselyrelatedbacterialspeciesincomplexenvironments
AT martinezsanchezshamayim globalgenomicsimilarityandcoregenomesequencediversityofthestreptococcusgenusasatoolkittoidentifycloselyrelatedbacterialspeciesincomplexenvironments
AT alcarazluisd globalgenomicsimilarityandcoregenomesequencediversityofthestreptococcusgenusasatoolkittoidentifycloselyrelatedbacterialspeciesincomplexenvironments