Cargando…

Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups

In this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined and classified. This is, to our knowledge, the largest E. coli genome dataset analyzed to date. A Mash-based analysis of a cleaned set of 10,667 E. coli genomes from GenBank revealed 14 distinct phylogr...

Descripción completa

Detalles Bibliográficos
Autores principales: Abram, Kaleb, Udaondo, Zulema, Bleker, Carissa, Wanchai, Visanu, Wassenaar, Trudy M., Robeson, Michael S., Ussery, David W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7838162/
https://www.ncbi.nlm.nih.gov/pubmed/33500552
http://dx.doi.org/10.1038/s42003-020-01626-5
_version_ 1783643111771078656
author Abram, Kaleb
Udaondo, Zulema
Bleker, Carissa
Wanchai, Visanu
Wassenaar, Trudy M.
Robeson, Michael S.
Ussery, David W.
author_facet Abram, Kaleb
Udaondo, Zulema
Bleker, Carissa
Wanchai, Visanu
Wassenaar, Trudy M.
Robeson, Michael S.
Ussery, David W.
author_sort Abram, Kaleb
collection PubMed
description In this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined and classified. This is, to our knowledge, the largest E. coli genome dataset analyzed to date. A Mash-based analysis of a cleaned set of 10,667 E. coli genomes from GenBank revealed 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup was used as a proxy to classify 95,525 unassembled genomes from the Sequence Read Archive (SRA). We find that most of the sequenced E. coli genomes belong to four phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups is supported by several different lines of evidence: phylogroup-specific core genes, a phylogenetic tree constructed with 2613 single copy core genes, and differences in the rates of gene gain/loss/duplication. The methodology used in this work is able to reproduce known phylogroups, as well as to identify previously uncharacterized phylogroups in E. coli species.
format Online
Article
Text
id pubmed-7838162
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-78381622021-01-29 Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups Abram, Kaleb Udaondo, Zulema Bleker, Carissa Wanchai, Visanu Wassenaar, Trudy M. Robeson, Michael S. Ussery, David W. Commun Biol Article In this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined and classified. This is, to our knowledge, the largest E. coli genome dataset analyzed to date. A Mash-based analysis of a cleaned set of 10,667 E. coli genomes from GenBank revealed 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup was used as a proxy to classify 95,525 unassembled genomes from the Sequence Read Archive (SRA). We find that most of the sequenced E. coli genomes belong to four phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups is supported by several different lines of evidence: phylogroup-specific core genes, a phylogenetic tree constructed with 2613 single copy core genes, and differences in the rates of gene gain/loss/duplication. The methodology used in this work is able to reproduce known phylogroups, as well as to identify previously uncharacterized phylogroups in E. coli species. Nature Publishing Group UK 2021-01-26 /pmc/articles/PMC7838162/ /pubmed/33500552 http://dx.doi.org/10.1038/s42003-020-01626-5 Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Abram, Kaleb
Udaondo, Zulema
Bleker, Carissa
Wanchai, Visanu
Wassenaar, Trudy M.
Robeson, Michael S.
Ussery, David W.
Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups
title Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups
title_full Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups
title_fullStr Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups
title_full_unstemmed Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups
title_short Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups
title_sort mash-based analyses of escherichia coli genomes reveal 14 distinct phylogroups
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7838162/
https://www.ncbi.nlm.nih.gov/pubmed/33500552
http://dx.doi.org/10.1038/s42003-020-01626-5
work_keys_str_mv AT abramkaleb mashbasedanalysesofescherichiacoligenomesreveal14distinctphylogroups
AT udaondozulema mashbasedanalysesofescherichiacoligenomesreveal14distinctphylogroups
AT blekercarissa mashbasedanalysesofescherichiacoligenomesreveal14distinctphylogroups
AT wanchaivisanu mashbasedanalysesofescherichiacoligenomesreveal14distinctphylogroups
AT wassenaartrudym mashbasedanalysesofescherichiacoligenomesreveal14distinctphylogroups
AT robesonmichaels mashbasedanalysesofescherichiacoligenomesreveal14distinctphylogroups
AT usserydavidw mashbasedanalysesofescherichiacoligenomesreveal14distinctphylogroups