Cargando…

PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning

PacBio long reads sequencing presents several potential advantages for DNA assembly, including being able to provide more complete gene profiling of metagenomic samples. However, lower single-pass accuracy can make gene discovery and assembly for low-abundance organisms difficult. To evaluate the ap...

Descripción completa

Detalles Bibliográficos
Autores principales: Xie, Haiying, Yang, Caiyun, Sun, Yamin, Igarashi, Yasuo, Jin, Tao, Luo, Feng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7506068/
https://www.ncbi.nlm.nih.gov/pubmed/33101371
http://dx.doi.org/10.3389/fgene.2020.516269
_version_ 1783584950834954240
author Xie, Haiying
Yang, Caiyun
Sun, Yamin
Igarashi, Yasuo
Jin, Tao
Luo, Feng
author_facet Xie, Haiying
Yang, Caiyun
Sun, Yamin
Igarashi, Yasuo
Jin, Tao
Luo, Feng
author_sort Xie, Haiying
collection PubMed
description PacBio long reads sequencing presents several potential advantages for DNA assembly, including being able to provide more complete gene profiling of metagenomic samples. However, lower single-pass accuracy can make gene discovery and assembly for low-abundance organisms difficult. To evaluate the application and performance of PacBio long reads and Illumina HiSeq short reads in metagenomic analyses, we directly compared various assemblies involving PacBio and Illumina sequencing reads based on two anaerobic digestion microbiome samples from a biogas fermenter. Using a PacBio platform, 1.58 million long reads (19.6 Gb) were produced with an average length of 7,604 bp. Using an Illumina HiSeq platform, 151.2 million read pairs (45.4 Gb) were produced. Hybrid assemblies using PacBio long reads and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length, contig N50 size, and number of large contigs. Interestingly, depth-based hybrid assemblies generated a higher percentage of complete genes (98.86%) compared to those based on HiSeq contigs only (40.29%), because the PacBio reads were long enough to cover many repeating short elements and capture multiple genes in a single read. Additionally, the incorporation of PacBio long reads led to considerable advantages regarding reducing contig numbers and increasing the completeness of the genome reconstruction, which was poorly assembled and binned when using HiSeq data alone. From this comparison of PacBio long reads with Illumina HiSeq short reads related to complex microbiome samples, we conclude that PacBio long reads can produce longer contigs, more complete genes, and better genome binning, thereby offering more information about metagenomic samples.
format Online
Article
Text
id pubmed-7506068
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-75060682020-10-22 PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning Xie, Haiying Yang, Caiyun Sun, Yamin Igarashi, Yasuo Jin, Tao Luo, Feng Front Genet Genetics PacBio long reads sequencing presents several potential advantages for DNA assembly, including being able to provide more complete gene profiling of metagenomic samples. However, lower single-pass accuracy can make gene discovery and assembly for low-abundance organisms difficult. To evaluate the application and performance of PacBio long reads and Illumina HiSeq short reads in metagenomic analyses, we directly compared various assemblies involving PacBio and Illumina sequencing reads based on two anaerobic digestion microbiome samples from a biogas fermenter. Using a PacBio platform, 1.58 million long reads (19.6 Gb) were produced with an average length of 7,604 bp. Using an Illumina HiSeq platform, 151.2 million read pairs (45.4 Gb) were produced. Hybrid assemblies using PacBio long reads and HiSeq contigs produced improvements in assembly statistics, including an increase in the average contig length, contig N50 size, and number of large contigs. Interestingly, depth-based hybrid assemblies generated a higher percentage of complete genes (98.86%) compared to those based on HiSeq contigs only (40.29%), because the PacBio reads were long enough to cover many repeating short elements and capture multiple genes in a single read. Additionally, the incorporation of PacBio long reads led to considerable advantages regarding reducing contig numbers and increasing the completeness of the genome reconstruction, which was poorly assembled and binned when using HiSeq data alone. From this comparison of PacBio long reads with Illumina HiSeq short reads related to complex microbiome samples, we conclude that PacBio long reads can produce longer contigs, more complete genes, and better genome binning, thereby offering more information about metagenomic samples. Frontiers Media S.A. 2020-09-08 /pmc/articles/PMC7506068/ /pubmed/33101371 http://dx.doi.org/10.3389/fgene.2020.516269 Text en Copyright © 2020 Xie, Yang, Sun, Igarashi, Jin and Luo. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Xie, Haiying
Yang, Caiyun
Sun, Yamin
Igarashi, Yasuo
Jin, Tao
Luo, Feng
PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning
title PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning
title_full PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning
title_fullStr PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning
title_full_unstemmed PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning
title_short PacBio Long Reads Improve Metagenomic Assemblies, Gene Catalogs, and Genome Binning
title_sort pacbio long reads improve metagenomic assemblies, gene catalogs, and genome binning
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7506068/
https://www.ncbi.nlm.nih.gov/pubmed/33101371
http://dx.doi.org/10.3389/fgene.2020.516269
work_keys_str_mv AT xiehaiying pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning
AT yangcaiyun pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning
AT sunyamin pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning
AT igarashiyasuo pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning
AT jintao pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning
AT luofeng pacbiolongreadsimprovemetagenomicassembliesgenecatalogsandgenomebinning