Cargando…
Efficient and accurate whole genome assembly and methylome profiling of E. coli
BACKGROUND: With the price of next generation sequencing steadily decreasing, bacterial genome assembly is now accessible to a wide range of researchers. It is therefore necessary to understand the best methods for generating a genome assembly, specifically, which combination of sequencing and bioin...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046830/ https://www.ncbi.nlm.nih.gov/pubmed/24090403 http://dx.doi.org/10.1186/1471-2164-14-675 |
_version_ | 1782480321391886336 |
---|---|
author | Powers, Jason G Weigman, Victor J Shu, Jenny Pufky, John M Cox, Donald Hurban, Patrick |
author_facet | Powers, Jason G Weigman, Victor J Shu, Jenny Pufky, John M Cox, Donald Hurban, Patrick |
author_sort | Powers, Jason G |
collection | PubMed |
description | BACKGROUND: With the price of next generation sequencing steadily decreasing, bacterial genome assembly is now accessible to a wide range of researchers. It is therefore necessary to understand the best methods for generating a genome assembly, specifically, which combination of sequencing and bioinformatics strategies result in the most accurate assemblies. Here, we sequence three E. coli strains on the Illumina MiSeq, Life Technologies Ion Torrent PGM, and Pacific Biosciences RS. We then perform genome assemblies on all three datasets alone or in combination to determine the best methods for the assembly of bacterial genomes. RESULTS: Three E. coli strains – BL21(DE3), Bal225, and DH5α – were sequenced to a depth of 100× on the MiSeq and Ion Torrent machines and to at least 125× on the PacBio RS. Four assembly methods were examined and compared. The previously published BL21(DE3) genome [GenBank:AM946981.2], allowed us to evaluate the accuracy of each of the BL21(DE3) assemblies. BL21(DE3) PacBio-only assemblies resulted in a 90% reduction in contigs versus short read only assemblies, while N50 numbers increased by over 7-fold. Strikingly, the number of SNPs in PacBio-only assemblies were less than half that seen with short read assemblies (~20 SNPs vs. ~50 SNPs) and indels also saw dramatic reductions (~2 indel >5 bp in PacBio-only assemblies vs. ~12 for short-read only assemblies). Assemblies that used a mixture of PacBio and short read data generally fell in between these two extremes. Use of PacBio sequencing reads also allowed us to call covalent base modifications for the three strains. Each of the strains used here had a known covalent base modification genotype, which was confirmed by PacBio sequencing. CONCLUSION: Using data generated solely from the Pacific Biosciences RS, we were able to generate the most complete and accurate de novo assemblies of E. coli strains. We found that the addition of other sequencing technology data offered no improvements over use of PacBio data alone. In addition, the sequencing data from the PacBio RS allowed for sensitive and specific calling of covalent base modifications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-14-675) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4046830 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40468302014-06-06 Efficient and accurate whole genome assembly and methylome profiling of E. coli Powers, Jason G Weigman, Victor J Shu, Jenny Pufky, John M Cox, Donald Hurban, Patrick BMC Genomics Methodology Article BACKGROUND: With the price of next generation sequencing steadily decreasing, bacterial genome assembly is now accessible to a wide range of researchers. It is therefore necessary to understand the best methods for generating a genome assembly, specifically, which combination of sequencing and bioinformatics strategies result in the most accurate assemblies. Here, we sequence three E. coli strains on the Illumina MiSeq, Life Technologies Ion Torrent PGM, and Pacific Biosciences RS. We then perform genome assemblies on all three datasets alone or in combination to determine the best methods for the assembly of bacterial genomes. RESULTS: Three E. coli strains – BL21(DE3), Bal225, and DH5α – were sequenced to a depth of 100× on the MiSeq and Ion Torrent machines and to at least 125× on the PacBio RS. Four assembly methods were examined and compared. The previously published BL21(DE3) genome [GenBank:AM946981.2], allowed us to evaluate the accuracy of each of the BL21(DE3) assemblies. BL21(DE3) PacBio-only assemblies resulted in a 90% reduction in contigs versus short read only assemblies, while N50 numbers increased by over 7-fold. Strikingly, the number of SNPs in PacBio-only assemblies were less than half that seen with short read assemblies (~20 SNPs vs. ~50 SNPs) and indels also saw dramatic reductions (~2 indel >5 bp in PacBio-only assemblies vs. ~12 for short-read only assemblies). Assemblies that used a mixture of PacBio and short read data generally fell in between these two extremes. Use of PacBio sequencing reads also allowed us to call covalent base modifications for the three strains. Each of the strains used here had a known covalent base modification genotype, which was confirmed by PacBio sequencing. CONCLUSION: Using data generated solely from the Pacific Biosciences RS, we were able to generate the most complete and accurate de novo assemblies of E. coli strains. We found that the addition of other sequencing technology data offered no improvements over use of PacBio data alone. In addition, the sequencing data from the PacBio RS allowed for sensitive and specific calling of covalent base modifications. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-14-675) contains supplementary material, which is available to authorized users. BioMed Central 2013-10-03 /pmc/articles/PMC4046830/ /pubmed/24090403 http://dx.doi.org/10.1186/1471-2164-14-675 Text en © Powers et al.; licensee BioMed Central Ltd. 2013 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Powers, Jason G Weigman, Victor J Shu, Jenny Pufky, John M Cox, Donald Hurban, Patrick Efficient and accurate whole genome assembly and methylome profiling of E. coli |
title | Efficient and accurate whole genome assembly and methylome profiling of E. coli |
title_full | Efficient and accurate whole genome assembly and methylome profiling of E. coli |
title_fullStr | Efficient and accurate whole genome assembly and methylome profiling of E. coli |
title_full_unstemmed | Efficient and accurate whole genome assembly and methylome profiling of E. coli |
title_short | Efficient and accurate whole genome assembly and methylome profiling of E. coli |
title_sort | efficient and accurate whole genome assembly and methylome profiling of e. coli |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046830/ https://www.ncbi.nlm.nih.gov/pubmed/24090403 http://dx.doi.org/10.1186/1471-2164-14-675 |
work_keys_str_mv | AT powersjasong efficientandaccuratewholegenomeassemblyandmethylomeprofilingofecoli AT weigmanvictorj efficientandaccuratewholegenomeassemblyandmethylomeprofilingofecoli AT shujenny efficientandaccuratewholegenomeassemblyandmethylomeprofilingofecoli AT pufkyjohnm efficientandaccuratewholegenomeassemblyandmethylomeprofilingofecoli AT coxdonald efficientandaccuratewholegenomeassemblyandmethylomeprofilingofecoli AT hurbanpatrick efficientandaccuratewholegenomeassemblyandmethylomeprofilingofecoli |