Cargando…

Efficient and accurate whole genome assembly and methylome profiling of E. coli

BACKGROUND: With the price of next generation sequencing steadily decreasing, bacterial genome assembly is now accessible to a wide range of researchers. It is therefore necessary to understand the best methods for generating a genome assembly, specifically, which combination of sequencing and bioin...

Descripción completa

Detalles Bibliográficos
Autores principales: Powers, Jason G, Weigman, Victor J, Shu, Jenny, Pufky, John M, Cox, Donald, Hurban, Patrick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046830/
https://www.ncbi.nlm.nih.gov/pubmed/24090403
http://dx.doi.org/10.1186/1471-2164-14-675
_version_ 1782480321391886336
author Powers, Jason G
Weigman, Victor J
Shu, Jenny
Pufky, John M
Cox, Donald
Hurban, Patrick
author_facet Powers, Jason G
Weigman, Victor J
Shu, Jenny
Pufky, John M
Cox, Donald
Hurban, Patrick
author_sort Powers, Jason G
collection PubMed
description BACKGROUND: With the price of next generation sequencing steadily decreasing, bacterial genome assembly is now accessible to a wide range of researchers. It is therefore necessary to understand the best methods for generating a genome assembly, specifically, which combination of sequencing and bioinformatics strategies result in the most accurate assemblies. Here, we sequence three E. coli strains on the Illumina MiSeq, Life Technologies Ion Torrent PGM, and Pacific Biosciences RS. We then perform genome assemblies on all three datasets alone or in combination to determine the best methods for the assembly of bacterial genomes. RESULTS: Three E. coli strains – BL21(DE3), Bal225, and DH5α – were sequenced to a depth of 100× on the MiSeq and Ion Torrent machines and to at least 125× on the PacBio RS. Four assembly methods were examined and compared. The previously published BL21(DE3) genome [GenBank:AM946981.2], allowed us to evaluate the accuracy of each of the BL21(DE3) assemblies. BL21(DE3) PacBio-only assemblies resulted in a 90% reduction in contigs versus short read only assemblies, while N50 numbers increased by over 7-fold. Strikingly, the number of SNPs in PacBio-only assemblies were less than half that seen with short read assemblies (~20 SNPs vs. ~50 SNPs) and indels also saw dramatic reductions (~2 indel >5 bp in PacBio-only assemblies vs. ~12 for short-read only assemblies). Assemblies that used a mixture of PacBio and short read data generally fell in between these two extremes. Use of PacBio sequencing reads also allowed us to call covalent base modifications for the three strains. Each of the strains used here had a known covalent base modification genotype, which was confirmed by PacBio sequencing. CONCLUSION: Using data generated solely from the Pacific Biosciences RS, we were able to generate the most complete and accurate de novo assemblies of E. coli strains. We found that the addition of other sequencing technology data offered no improvements over use of PacBio data alone. In addition, the sequencing data from the PacBio RS allowed for sensitive and specific calling of covalent base modifications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-14-675) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4046830
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40468302014-06-06 Efficient and accurate whole genome assembly and methylome profiling of E. coli Powers, Jason G Weigman, Victor J Shu, Jenny Pufky, John M Cox, Donald Hurban, Patrick BMC Genomics Methodology Article BACKGROUND: With the price of next generation sequencing steadily decreasing, bacterial genome assembly is now accessible to a wide range of researchers. It is therefore necessary to understand the best methods for generating a genome assembly, specifically, which combination of sequencing and bioinformatics strategies result in the most accurate assemblies. Here, we sequence three E. coli strains on the Illumina MiSeq, Life Technologies Ion Torrent PGM, and Pacific Biosciences RS. We then perform genome assemblies on all three datasets alone or in combination to determine the best methods for the assembly of bacterial genomes. RESULTS: Three E. coli strains – BL21(DE3), Bal225, and DH5α – were sequenced to a depth of 100× on the MiSeq and Ion Torrent machines and to at least 125× on the PacBio RS. Four assembly methods were examined and compared. The previously published BL21(DE3) genome [GenBank:AM946981.2], allowed us to evaluate the accuracy of each of the BL21(DE3) assemblies. BL21(DE3) PacBio-only assemblies resulted in a 90% reduction in contigs versus short read only assemblies, while N50 numbers increased by over 7-fold. Strikingly, the number of SNPs in PacBio-only assemblies were less than half that seen with short read assemblies (~20 SNPs vs. ~50 SNPs) and indels also saw dramatic reductions (~2 indel >5 bp in PacBio-only assemblies vs. ~12 for short-read only assemblies). Assemblies that used a mixture of PacBio and short read data generally fell in between these two extremes. Use of PacBio sequencing reads also allowed us to call covalent base modifications for the three strains. Each of the strains used here had a known covalent base modification genotype, which was confirmed by PacBio sequencing. CONCLUSION: Using data generated solely from the Pacific Biosciences RS, we were able to generate the most complete and accurate de novo assemblies of E. coli strains. We found that the addition of other sequencing technology data offered no improvements over use of PacBio data alone. In addition, the sequencing data from the PacBio RS allowed for sensitive and specific calling of covalent base modifications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-14-675) contains supplementary material, which is available to authorized users. BioMed Central 2013-10-03 /pmc/articles/PMC4046830/ /pubmed/24090403 http://dx.doi.org/10.1186/1471-2164-14-675 Text en © Powers et al.; licensee BioMed Central Ltd. 2013 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Powers, Jason G
Weigman, Victor J
Shu, Jenny
Pufky, John M
Cox, Donald
Hurban, Patrick
Efficient and accurate whole genome assembly and methylome profiling of E. coli
title Efficient and accurate whole genome assembly and methylome profiling of E. coli
title_full Efficient and accurate whole genome assembly and methylome profiling of E. coli
title_fullStr Efficient and accurate whole genome assembly and methylome profiling of E. coli
title_full_unstemmed Efficient and accurate whole genome assembly and methylome profiling of E. coli
title_short Efficient and accurate whole genome assembly and methylome profiling of E. coli
title_sort efficient and accurate whole genome assembly and methylome profiling of e. coli
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046830/
https://www.ncbi.nlm.nih.gov/pubmed/24090403
http://dx.doi.org/10.1186/1471-2164-14-675
work_keys_str_mv AT powersjasong efficientandaccuratewholegenomeassemblyandmethylomeprofilingofecoli
AT weigmanvictorj efficientandaccuratewholegenomeassemblyandmethylomeprofilingofecoli
AT shujenny efficientandaccuratewholegenomeassemblyandmethylomeprofilingofecoli
AT pufkyjohnm efficientandaccuratewholegenomeassemblyandmethylomeprofilingofecoli
AT coxdonald efficientandaccuratewholegenomeassemblyandmethylomeprofilingofecoli
AT hurbanpatrick efficientandaccuratewholegenomeassemblyandmethylomeprofilingofecoli