Cargando…
Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools
BACKGROUND: Illumina technology currently dominates bacterial genomics due to its high read accuracy and low sequencing cost. However, the incompleteness of draft genomes generated by Illumina reads limits their application in comprehensive genomics analyses. Alternatively, hybrid assembly using bot...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7392658/ https://www.ncbi.nlm.nih.gov/pubmed/32727443 http://dx.doi.org/10.1186/s12864-020-06910-6 |
_version_ | 1783564889982238720 |
---|---|
author | Derakhshani, Hooman Bernier, Steve P. Marko, Victoria A. Surette, Michael G. |
author_facet | Derakhshani, Hooman Bernier, Steve P. Marko, Victoria A. Surette, Michael G. |
author_sort | Derakhshani, Hooman |
collection | PubMed |
description | BACKGROUND: Illumina technology currently dominates bacterial genomics due to its high read accuracy and low sequencing cost. However, the incompleteness of draft genomes generated by Illumina reads limits their application in comprehensive genomics analyses. Alternatively, hybrid assembly using both Illumina short reads and long reads generated by single molecule sequencing technologies can enable assembly of complete bacterial genomes, yet the high per-genome cost of long-read sequencing limits the widespread use of this approach in bacterial genomics. Here we developed a protocol for hybrid assembly of complete bacterial genomes using miniaturized multiplexed Illumina sequencing and non-barcoded PacBio sequencing of a synthetic genomic pool (SGP), thus significantly decreasing the overall per-genome cost of sequencing. RESULTS: We evaluated the performance of SGP hybrid assembly on the genomes of 20 bacterial isolates with different genome sizes, a wide range of GC contents, and varying levels of phylogenetic relatedness. By improving the contiguity of Illumina assemblies, SGP hybrid assembly generated 17 complete and 3 nearly complete bacterial genomes. Increased contiguity of SGP hybrid assemblies resulted in considerable improvement in gene prediction and annotation. In addition, SGP hybrid assembly was able to resolve repeat elements and identify intragenomic heterogeneities, e.g. different copies of 16S rRNA genes, that would otherwise go undetected by short-read-only assembly. Comprehensive comparison of SGP hybrid assemblies with those generated using multiplexed PacBio long reads (long-read-only assembly) also revealed the relative advantage of SGP hybrid assembly in terms of assembly quality. In particular, we observed that SGP hybrid assemblies were completely devoid of both small (i.e. single base substitutions) and large assembly errors. Finally, we show the ability of SGP hybrid assembly to differentiate genomes of closely related bacterial isolates, suggesting its potential application in comparative genomics and pangenome analysis. CONCLUSION: Our results indicate the superiority of SGP hybrid assembly over both short-read and long-read assemblies with respect to completeness, contiguity, accuracy, and recovery of small replicons. By lowering the per-genome cost of sequencing, our parallel sequencing and hybrid assembly pipeline could serve as a cost effective and high throughput approach for completing high-quality bacterial genomes. |
format | Online Article Text |
id | pubmed-7392658 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-73926582020-08-04 Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools Derakhshani, Hooman Bernier, Steve P. Marko, Victoria A. Surette, Michael G. BMC Genomics Methodology Article BACKGROUND: Illumina technology currently dominates bacterial genomics due to its high read accuracy and low sequencing cost. However, the incompleteness of draft genomes generated by Illumina reads limits their application in comprehensive genomics analyses. Alternatively, hybrid assembly using both Illumina short reads and long reads generated by single molecule sequencing technologies can enable assembly of complete bacterial genomes, yet the high per-genome cost of long-read sequencing limits the widespread use of this approach in bacterial genomics. Here we developed a protocol for hybrid assembly of complete bacterial genomes using miniaturized multiplexed Illumina sequencing and non-barcoded PacBio sequencing of a synthetic genomic pool (SGP), thus significantly decreasing the overall per-genome cost of sequencing. RESULTS: We evaluated the performance of SGP hybrid assembly on the genomes of 20 bacterial isolates with different genome sizes, a wide range of GC contents, and varying levels of phylogenetic relatedness. By improving the contiguity of Illumina assemblies, SGP hybrid assembly generated 17 complete and 3 nearly complete bacterial genomes. Increased contiguity of SGP hybrid assemblies resulted in considerable improvement in gene prediction and annotation. In addition, SGP hybrid assembly was able to resolve repeat elements and identify intragenomic heterogeneities, e.g. different copies of 16S rRNA genes, that would otherwise go undetected by short-read-only assembly. Comprehensive comparison of SGP hybrid assemblies with those generated using multiplexed PacBio long reads (long-read-only assembly) also revealed the relative advantage of SGP hybrid assembly in terms of assembly quality. In particular, we observed that SGP hybrid assemblies were completely devoid of both small (i.e. single base substitutions) and large assembly errors. Finally, we show the ability of SGP hybrid assembly to differentiate genomes of closely related bacterial isolates, suggesting its potential application in comparative genomics and pangenome analysis. CONCLUSION: Our results indicate the superiority of SGP hybrid assembly over both short-read and long-read assemblies with respect to completeness, contiguity, accuracy, and recovery of small replicons. By lowering the per-genome cost of sequencing, our parallel sequencing and hybrid assembly pipeline could serve as a cost effective and high throughput approach for completing high-quality bacterial genomes. BioMed Central 2020-07-29 /pmc/articles/PMC7392658/ /pubmed/32727443 http://dx.doi.org/10.1186/s12864-020-06910-6 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Derakhshani, Hooman Bernier, Steve P. Marko, Victoria A. Surette, Michael G. Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools |
title | Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools |
title_full | Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools |
title_fullStr | Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools |
title_full_unstemmed | Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools |
title_short | Completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools |
title_sort | completion of draft bacterial genomes by long-read sequencing of synthetic genomic pools |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7392658/ https://www.ncbi.nlm.nih.gov/pubmed/32727443 http://dx.doi.org/10.1186/s12864-020-06910-6 |
work_keys_str_mv | AT derakhshanihooman completionofdraftbacterialgenomesbylongreadsequencingofsyntheticgenomicpools AT bernierstevep completionofdraftbacterialgenomesbylongreadsequencingofsyntheticgenomicpools AT markovictoriaa completionofdraftbacterialgenomesbylongreadsequencingofsyntheticgenomicpools AT surettemichaelg completionofdraftbacterialgenomesbylongreadsequencingofsyntheticgenomicpools |