Cargando…
Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups
In this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined and classified. This is, to our knowledge, the largest E. coli genome dataset analyzed to date. A Mash-based analysis of a cleaned set of 10,667 E. coli genomes from GenBank revealed 14 distinct phylogr...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7838162/ https://www.ncbi.nlm.nih.gov/pubmed/33500552 http://dx.doi.org/10.1038/s42003-020-01626-5 |
_version_ | 1783643111771078656 |
---|---|
author | Abram, Kaleb Udaondo, Zulema Bleker, Carissa Wanchai, Visanu Wassenaar, Trudy M. Robeson, Michael S. Ussery, David W. |
author_facet | Abram, Kaleb Udaondo, Zulema Bleker, Carissa Wanchai, Visanu Wassenaar, Trudy M. Robeson, Michael S. Ussery, David W. |
author_sort | Abram, Kaleb |
collection | PubMed |
description | In this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined and classified. This is, to our knowledge, the largest E. coli genome dataset analyzed to date. A Mash-based analysis of a cleaned set of 10,667 E. coli genomes from GenBank revealed 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup was used as a proxy to classify 95,525 unassembled genomes from the Sequence Read Archive (SRA). We find that most of the sequenced E. coli genomes belong to four phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups is supported by several different lines of evidence: phylogroup-specific core genes, a phylogenetic tree constructed with 2613 single copy core genes, and differences in the rates of gene gain/loss/duplication. The methodology used in this work is able to reproduce known phylogroups, as well as to identify previously uncharacterized phylogroups in E. coli species. |
format | Online Article Text |
id | pubmed-7838162 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-78381622021-01-29 Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups Abram, Kaleb Udaondo, Zulema Bleker, Carissa Wanchai, Visanu Wassenaar, Trudy M. Robeson, Michael S. Ussery, David W. Commun Biol Article In this study, more than one hundred thousand Escherichia coli and Shigella genomes were examined and classified. This is, to our knowledge, the largest E. coli genome dataset analyzed to date. A Mash-based analysis of a cleaned set of 10,667 E. coli genomes from GenBank revealed 14 distinct phylogroups. A representative genome or medoid identified for each phylogroup was used as a proxy to classify 95,525 unassembled genomes from the Sequence Read Archive (SRA). We find that most of the sequenced E. coli genomes belong to four phylogroups (A, C, B1 and E2(O157)). Authenticity of the 14 phylogroups is supported by several different lines of evidence: phylogroup-specific core genes, a phylogenetic tree constructed with 2613 single copy core genes, and differences in the rates of gene gain/loss/duplication. The methodology used in this work is able to reproduce known phylogroups, as well as to identify previously uncharacterized phylogroups in E. coli species. Nature Publishing Group UK 2021-01-26 /pmc/articles/PMC7838162/ /pubmed/33500552 http://dx.doi.org/10.1038/s42003-020-01626-5 Text en © The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Abram, Kaleb Udaondo, Zulema Bleker, Carissa Wanchai, Visanu Wassenaar, Trudy M. Robeson, Michael S. Ussery, David W. Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups |
title | Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups |
title_full | Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups |
title_fullStr | Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups |
title_full_unstemmed | Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups |
title_short | Mash-based analyses of Escherichia coli genomes reveal 14 distinct phylogroups |
title_sort | mash-based analyses of escherichia coli genomes reveal 14 distinct phylogroups |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7838162/ https://www.ncbi.nlm.nih.gov/pubmed/33500552 http://dx.doi.org/10.1038/s42003-020-01626-5 |
work_keys_str_mv | AT abramkaleb mashbasedanalysesofescherichiacoligenomesreveal14distinctphylogroups AT udaondozulema mashbasedanalysesofescherichiacoligenomesreveal14distinctphylogroups AT blekercarissa mashbasedanalysesofescherichiacoligenomesreveal14distinctphylogroups AT wanchaivisanu mashbasedanalysesofescherichiacoligenomesreveal14distinctphylogroups AT wassenaartrudym mashbasedanalysesofescherichiacoligenomesreveal14distinctphylogroups AT robesonmichaels mashbasedanalysesofescherichiacoligenomesreveal14distinctphylogroups AT usserydavidw mashbasedanalysesofescherichiacoligenomesreveal14distinctphylogroups |