Cargando…
A comprehensive and high-quality collection of Escherichia coli genomes and their genes
Escherichia coli is a highly diverse organism that includes a range of commensal and pathogenic variants found across a range of niches and worldwide. In addition to causing severe intestinal and extraintestinal disease, E. coli is considered a priority pathogen due to high levels of observed drug r...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Microbiology Society
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8208696/ https://www.ncbi.nlm.nih.gov/pubmed/33417534 http://dx.doi.org/10.1099/mgen.0.000499 |
_version_ | 1783708973266894848 |
---|---|
author | Horesh, Gal Blackwell, Grace A. Tonkin-Hill, Gerry Corander, Jukka Heinz, Eva Thomson, Nicholas R. |
author_facet | Horesh, Gal Blackwell, Grace A. Tonkin-Hill, Gerry Corander, Jukka Heinz, Eva Thomson, Nicholas R. |
author_sort | Horesh, Gal |
collection | PubMed |
description | Escherichia coli is a highly diverse organism that includes a range of commensal and pathogenic variants found across a range of niches and worldwide. In addition to causing severe intestinal and extraintestinal disease, E. coli is considered a priority pathogen due to high levels of observed drug resistance. The diversity in the E. coli population is driven by high genome plasticity and a very large gene pool. All these have made E. coli one of the most well-studied organisms, as well as a commonly used laboratory strain. Today, there are thousands of sequenced E. coli genomes stored in public databases. While data is widely available, accessing the information in order to perform analyses can still be a challenge. Collecting relevant available data requires accessing different sources, where data may be stored in a range of formats, and often requires further manipulation and processing to apply various analyses and extract useful information. In this study, we collated and intensely curated a collection of over 10 000 E. coli and Shigella genomes to provide a single, uniform, high-quality dataset. Shigella were included as they are considered specialized pathovars of E. coli . We provide these data in a number of easily accessible formats that can be used as the foundation for future studies addressing the biological differences between E. coli lineages and the distribution and flow of genes in the E. coli population at a high resolution. The analysis we present emphasizes our lack of understanding of the true diversity of the E. coli species, and the biased nature of our current understanding of the genetic diversity of such a key pathogen. |
format | Online Article Text |
id | pubmed-8208696 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Microbiology Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-82086962021-06-17 A comprehensive and high-quality collection of Escherichia coli genomes and their genes Horesh, Gal Blackwell, Grace A. Tonkin-Hill, Gerry Corander, Jukka Heinz, Eva Thomson, Nicholas R. Microb Genom BioResource Escherichia coli is a highly diverse organism that includes a range of commensal and pathogenic variants found across a range of niches and worldwide. In addition to causing severe intestinal and extraintestinal disease, E. coli is considered a priority pathogen due to high levels of observed drug resistance. The diversity in the E. coli population is driven by high genome plasticity and a very large gene pool. All these have made E. coli one of the most well-studied organisms, as well as a commonly used laboratory strain. Today, there are thousands of sequenced E. coli genomes stored in public databases. While data is widely available, accessing the information in order to perform analyses can still be a challenge. Collecting relevant available data requires accessing different sources, where data may be stored in a range of formats, and often requires further manipulation and processing to apply various analyses and extract useful information. In this study, we collated and intensely curated a collection of over 10 000 E. coli and Shigella genomes to provide a single, uniform, high-quality dataset. Shigella were included as they are considered specialized pathovars of E. coli . We provide these data in a number of easily accessible formats that can be used as the foundation for future studies addressing the biological differences between E. coli lineages and the distribution and flow of genes in the E. coli population at a high resolution. The analysis we present emphasizes our lack of understanding of the true diversity of the E. coli species, and the biased nature of our current understanding of the genetic diversity of such a key pathogen. Microbiology Society 2021-01-08 /pmc/articles/PMC8208696/ /pubmed/33417534 http://dx.doi.org/10.1099/mgen.0.000499 Text en © 2021 The Authors https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution. |
spellingShingle | BioResource Horesh, Gal Blackwell, Grace A. Tonkin-Hill, Gerry Corander, Jukka Heinz, Eva Thomson, Nicholas R. A comprehensive and high-quality collection of Escherichia coli genomes and their genes |
title | A comprehensive and high-quality collection of Escherichia coli genomes and their genes |
title_full | A comprehensive and high-quality collection of Escherichia coli genomes and their genes |
title_fullStr | A comprehensive and high-quality collection of Escherichia coli genomes and their genes |
title_full_unstemmed | A comprehensive and high-quality collection of Escherichia coli genomes and their genes |
title_short | A comprehensive and high-quality collection of Escherichia coli genomes and their genes |
title_sort | comprehensive and high-quality collection of escherichia coli genomes and their genes |
topic | BioResource |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8208696/ https://www.ncbi.nlm.nih.gov/pubmed/33417534 http://dx.doi.org/10.1099/mgen.0.000499 |
work_keys_str_mv | AT horeshgal acomprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes AT blackwellgracea acomprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes AT tonkinhillgerry acomprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes AT coranderjukka acomprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes AT heinzeva acomprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes AT thomsonnicholasr acomprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes AT horeshgal comprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes AT blackwellgracea comprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes AT tonkinhillgerry comprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes AT coranderjukka comprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes AT heinzeva comprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes AT thomsonnicholasr comprehensiveandhighqualitycollectionofescherichiacoligenomesandtheirgenes |