Cargando…

Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes

BACKGROUND: Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful for c...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaas, Rolf S, Friis, Carsten, Ussery, David W, Aarestrup, Frank M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3575317/
https://www.ncbi.nlm.nih.gov/pubmed/23114024
http://dx.doi.org/10.1186/1471-2164-13-577
_version_ 1782259700336689152
author Kaas, Rolf S
Friis, Carsten
Ussery, David W
Aarestrup, Frank M
author_facet Kaas, Rolf S
Friis, Carsten
Ussery, David W
Aarestrup, Frank M
author_sort Kaas, Rolf S
collection PubMed
description BACKGROUND: Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful for creating better phylogenies, for determination of molecular clocks and for improved typing techniques. RESULTS: We find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness of the 186 sequenced E. coli genomes. The core-gene tree displays high confidence and divides the E. coli strains into the observed MLST type clades and also separates defined phylotypes. CONCLUSION: The results of comparing a large and diverse E. coli dataset support the theory that reliable and good resolution phylogenies can be inferred from the core-genome. The results further suggest that the resolution at the isolate level may, subsequently be improved by targeting more variable genes. The use of whole genome sequencing will make it possible to eliminate, or at least reduce, the need for several typing steps used in traditional epidemiology.
format Online
Article
Text
id pubmed-3575317
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35753172013-02-19 Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes Kaas, Rolf S Friis, Carsten Ussery, David W Aarestrup, Frank M BMC Genomics Research Article BACKGROUND: Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful for creating better phylogenies, for determination of molecular clocks and for improved typing techniques. RESULTS: We find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness of the 186 sequenced E. coli genomes. The core-gene tree displays high confidence and divides the E. coli strains into the observed MLST type clades and also separates defined phylotypes. CONCLUSION: The results of comparing a large and diverse E. coli dataset support the theory that reliable and good resolution phylogenies can be inferred from the core-genome. The results further suggest that the resolution at the isolate level may, subsequently be improved by targeting more variable genes. The use of whole genome sequencing will make it possible to eliminate, or at least reduce, the need for several typing steps used in traditional epidemiology. BioMed Central 2012-10-31 /pmc/articles/PMC3575317/ /pubmed/23114024 http://dx.doi.org/10.1186/1471-2164-13-577 Text en Copyright ©2012 Kaas et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kaas, Rolf S
Friis, Carsten
Ussery, David W
Aarestrup, Frank M
Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes
title Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes
title_full Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes
title_fullStr Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes
title_full_unstemmed Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes
title_short Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes
title_sort estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse escherichia coli genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3575317/
https://www.ncbi.nlm.nih.gov/pubmed/23114024
http://dx.doi.org/10.1186/1471-2164-13-577
work_keys_str_mv AT kaasrolfs estimatingvariationwithinthegenesandinferringthephylogenyof186sequenceddiverseescherichiacoligenomes
AT friiscarsten estimatingvariationwithinthegenesandinferringthephylogenyof186sequenceddiverseescherichiacoligenomes
AT usserydavidw estimatingvariationwithinthegenesandinferringthephylogenyof186sequenceddiverseescherichiacoligenomes
AT aarestrupfrankm estimatingvariationwithinthegenesandinferringthephylogenyof186sequenceddiverseescherichiacoligenomes