Cargando…

A comprehensive and high-quality collection of Escherichia coli genomes and their genes

Escherichia coli is a highly diverse organism that includes a range of commensal and pathogenic variants found across a range of niches and worldwide. In addition to causing severe intestinal and extraintestinal disease, E. coli is considered a priority pathogen due to high levels of observed drug r...

Descripción completa

Detalles Bibliográficos
Autores principales: Horesh, Gal, Blackwell, Grace A., Tonkin-Hill, Gerry, Corander, Jukka, Heinz, Eva, Thomson, Nicholas R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8208696/
https://www.ncbi.nlm.nih.gov/pubmed/33417534
http://dx.doi.org/10.1099/mgen.0.000499
_version_ 1783708973266894848
author Horesh, Gal
Blackwell, Grace A.
Tonkin-Hill, Gerry
Corander, Jukka
Heinz, Eva
Thomson, Nicholas R.
author_facet Horesh, Gal
Blackwell, Grace A.
Tonkin-Hill, Gerry
Corander, Jukka
Heinz, Eva
Thomson, Nicholas R.
author_sort Horesh, Gal
collection PubMed
description Escherichia coli is a highly diverse organism that includes a range of commensal and pathogenic variants found across a range of niches and worldwide. In addition to causing severe intestinal and extraintestinal disease, E. coli is considered a priority pathogen due to high levels of observed drug resistance. The diversity in the E. coli population is driven by high genome plasticity and a very large gene pool. All these have made E. coli one of the most well-studied organisms, as well as a commonly used laboratory strain. Today, there are thousands of sequenced E. coli genomes stored in public databases. While data is widely available, accessing the information in order to perform analyses can still be a challenge. Collecting relevant available data requires accessing different sources, where data may be stored in a range of formats, and often requires further manipulation and processing to apply various analyses and extract useful information. In this study, we collated and intensely curated a collection of over 10 000 E. coli and Shigella genomes to provide a single, uniform, high-quality dataset. Shigella were included as they are considered specialized pathovars of E. coli . We provide these data in a number of easily accessible formats that can be used as the foundation for future studies addressing the biological differences between E. coli lineages and the distribution and flow of genes in the E. coli population at a high resolution. The analysis we present emphasizes our lack of understanding of the true diversity of the E. coli species, and the biased nature of our current understanding of the genetic diversity of such a key pathogen.
format Online
Article
Text
id pubmed-8208696
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-82086962021-06-17 A comprehensive and high-quality collection of Escherichia coli genomes and their genes Horesh, Gal Blackwell, Grace A. Tonkin-Hill, Gerry Corander, Jukka Heinz, Eva Thomson, Nicholas R. Microb Genom BioResource Escherichia coli is a highly diverse organism that includes a range of commensal and pathogenic variants found across a range of niches and worldwide. In addition to causing severe intestinal and extraintestinal disease, E. coli is considered a priority pathogen due to high levels of observed drug resistance. The diversity in the E. coli population is driven by high genome plasticity and a very large gene pool. All these have made E. coli one of the most well-studied organisms, as well as a commonly used laboratory strain. Today, there are thousands of sequenced E. coli genomes stored in public databases. While data is widely available, accessing the information in order to perform analyses can still be a challenge. Collecting relevant available data requires accessing different sources, where data may be stored in a range of formats, and often requires further manipulation and processing to apply various analyses and extract useful information. In this study, we collated and intensely curated a collection of over 10 000 E. coli and Shigella genomes to provide a single, uniform, high-quality dataset. Shigella were included as they are considered specialized pathovars of E. coli . We provide these data in a number of easily accessible formats that can be used as the foundation for future studies addressing the biological differences between E. coli lineages and the distribution and flow of genes in the E. coli population at a high resolution. The analysis we present emphasizes our lack of understanding of the true diversity of the E. coli species, and the biased nature of our current understanding of the genetic diversity of such a key pathogen. Microbiology Society 2021-01-08 /pmc/articles/PMC8208696/ /pubmed/33417534 http://dx.doi.org/10.1099/mgen.0.000499 Text en © 2021 The Authors https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
spellingShingle BioResource
Horesh, Gal
Blackwell, Grace A.
Tonkin-Hill, Gerry
Corander, Jukka
Heinz, Eva
Thomson, Nicholas R.
A comprehensive and high-quality collection of Escherichia coli genomes and their genes
title A comprehensive and high-quality collection of Escherichia coli genomes and their genes
title_full A comprehensive and high-quality collection of Escherichia coli genomes and their genes
title_fullStr A comprehensive and high-quality collection of Escherichia coli genomes and their genes
title_full_unstemmed A comprehensive and high-quality collection of Escherichia coli genomes and their genes
title_short A comprehensive and high-quality collection of Escherichia coli genomes and their genes
title_sort comprehensive and high-quality collection of escherichia coli genomes and their genes
topic BioResource
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8208696/
https://www.ncbi.nlm.nih.gov/pubmed/33417534
http://dx.doi.org/10.1099/mgen.0.000499
work_keys_str_mv AT horeshgal acomprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes
AT blackwellgracea acomprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes
AT tonkinhillgerry acomprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes
AT coranderjukka acomprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes
AT heinzeva acomprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes
AT thomsonnicholasr acomprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes
AT horeshgal comprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes
AT blackwellgracea comprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes
AT tonkinhillgerry comprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes
AT coranderjukka comprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes
AT heinzeva comprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes
AT thomsonnicholasr comprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes