Cargando…

A comprehensive investigation of metagenome assembly by linked-read sequencing

BACKGROUND: The human microbiota are complex systems with important roles in our physiological activities and diseases. Sequencing the microbial genomes in the microbiota can help in our interpretation of their activities. The vast majority of the microbes in the microbiota cannot be isolated for in...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Lu, Fang, Xiaodong, Liao, Herui, Zhang, Zhenmiao, Zhou, Xin, Han, Lijuan, Chen, Yang, Qiu, Qinwei, Li, Shuai Cheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7659138/
https://www.ncbi.nlm.nih.gov/pubmed/33176883
http://dx.doi.org/10.1186/s40168-020-00929-3
_version_ 1783608799660081152
author Zhang, Lu
Fang, Xiaodong
Liao, Herui
Zhang, Zhenmiao
Zhou, Xin
Han, Lijuan
Chen, Yang
Qiu, Qinwei
Li, Shuai Cheng
author_facet Zhang, Lu
Fang, Xiaodong
Liao, Herui
Zhang, Zhenmiao
Zhou, Xin
Han, Lijuan
Chen, Yang
Qiu, Qinwei
Li, Shuai Cheng
author_sort Zhang, Lu
collection PubMed
description BACKGROUND: The human microbiota are complex systems with important roles in our physiological activities and diseases. Sequencing the microbial genomes in the microbiota can help in our interpretation of their activities. The vast majority of the microbes in the microbiota cannot be isolated for individual sequencing. Current metagenomics practices use short-read sequencing to simultaneously sequence a mixture of microbial genomes. However, these results are in ambiguity during genome assembly, leading to unsatisfactory microbial genome completeness and contig continuity. Linked-read sequencing is able to remove some of these ambiguities by attaching the same barcode to the reads from a long DNA fragment (10–100 kb), thus improving metagenome assembly. However, it is not clear how the choices for several parameters in the use of linked-read sequencing affect the assembly quality. RESULTS: We first examined the effects of read depth (C) on metagenome assembly from linked-reads in simulated data and a mock community. The results showed that C positively correlated with the length of assembled sequences but had little effect on their qualities. The latter observation was corroborated by tests using real data from the human gut microbiome, where C demonstrated minor impact on the sequence quality as well as on the proportion of bins annotated as draft genomes. On the other hand, metagenome assembly quality was susceptible to read depth per fragment (C(R)) and DNA fragment physical depth (C(F)). For the same C, deeper C(R) resulted in more draft genomes while deeper C(F) improved the quality of the draft genomes. We also found that average fragment length (μ(FL)) had marginal effect on assemblies, while fragments per partition (N(F/P)) impacted the off-target reads involved in local assembly, namely, lower N(F/P) values would lead to better assemblies by reducing the ambiguities of the off-target reads. In general, the use of linked-reads improved the assembly for contig N50 when compared to Illumina short-reads, but not when compared to PacBio CCS (circular consensus sequencing) long-reads. CONCLUSIONS: We investigated the influence of linked-read sequencing parameters on metagenome assembly comprehensively. While the quality of genome assembly from linked-reads cannot rival that from PacBio CCS long-reads, the case for using linked-read sequencing remains persuasive due to its low cost and high base-quality. Our study revealed that the probable best practice in using linked-reads for metagenome assembly was to merge the linked-reads from multiple libraries, where each had sufficient C(R) but a smaller amount of input DNA.
format Online
Article
Text
id pubmed-7659138
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-76591382020-11-13 A comprehensive investigation of metagenome assembly by linked-read sequencing Zhang, Lu Fang, Xiaodong Liao, Herui Zhang, Zhenmiao Zhou, Xin Han, Lijuan Chen, Yang Qiu, Qinwei Li, Shuai Cheng Microbiome Research BACKGROUND: The human microbiota are complex systems with important roles in our physiological activities and diseases. Sequencing the microbial genomes in the microbiota can help in our interpretation of their activities. The vast majority of the microbes in the microbiota cannot be isolated for individual sequencing. Current metagenomics practices use short-read sequencing to simultaneously sequence a mixture of microbial genomes. However, these results are in ambiguity during genome assembly, leading to unsatisfactory microbial genome completeness and contig continuity. Linked-read sequencing is able to remove some of these ambiguities by attaching the same barcode to the reads from a long DNA fragment (10–100 kb), thus improving metagenome assembly. However, it is not clear how the choices for several parameters in the use of linked-read sequencing affect the assembly quality. RESULTS: We first examined the effects of read depth (C) on metagenome assembly from linked-reads in simulated data and a mock community. The results showed that C positively correlated with the length of assembled sequences but had little effect on their qualities. The latter observation was corroborated by tests using real data from the human gut microbiome, where C demonstrated minor impact on the sequence quality as well as on the proportion of bins annotated as draft genomes. On the other hand, metagenome assembly quality was susceptible to read depth per fragment (C(R)) and DNA fragment physical depth (C(F)). For the same C, deeper C(R) resulted in more draft genomes while deeper C(F) improved the quality of the draft genomes. We also found that average fragment length (μ(FL)) had marginal effect on assemblies, while fragments per partition (N(F/P)) impacted the off-target reads involved in local assembly, namely, lower N(F/P) values would lead to better assemblies by reducing the ambiguities of the off-target reads. In general, the use of linked-reads improved the assembly for contig N50 when compared to Illumina short-reads, but not when compared to PacBio CCS (circular consensus sequencing) long-reads. CONCLUSIONS: We investigated the influence of linked-read sequencing parameters on metagenome assembly comprehensively. While the quality of genome assembly from linked-reads cannot rival that from PacBio CCS long-reads, the case for using linked-read sequencing remains persuasive due to its low cost and high base-quality. Our study revealed that the probable best practice in using linked-reads for metagenome assembly was to merge the linked-reads from multiple libraries, where each had sufficient C(R) but a smaller amount of input DNA. BioMed Central 2020-11-11 /pmc/articles/PMC7659138/ /pubmed/33176883 http://dx.doi.org/10.1186/s40168-020-00929-3 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Zhang, Lu
Fang, Xiaodong
Liao, Herui
Zhang, Zhenmiao
Zhou, Xin
Han, Lijuan
Chen, Yang
Qiu, Qinwei
Li, Shuai Cheng
A comprehensive investigation of metagenome assembly by linked-read sequencing
title A comprehensive investigation of metagenome assembly by linked-read sequencing
title_full A comprehensive investigation of metagenome assembly by linked-read sequencing
title_fullStr A comprehensive investigation of metagenome assembly by linked-read sequencing
title_full_unstemmed A comprehensive investigation of metagenome assembly by linked-read sequencing
title_short A comprehensive investigation of metagenome assembly by linked-read sequencing
title_sort comprehensive investigation of metagenome assembly by linked-read sequencing
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7659138/
https://www.ncbi.nlm.nih.gov/pubmed/33176883
http://dx.doi.org/10.1186/s40168-020-00929-3
work_keys_str_mv AT zhanglu acomprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT fangxiaodong acomprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT liaoherui acomprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT zhangzhenmiao acomprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT zhouxin acomprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT hanlijuan acomprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT chenyang acomprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT qiuqinwei acomprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT lishuaicheng acomprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT zhanglu comprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT fangxiaodong comprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT liaoherui comprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT zhangzhenmiao comprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT zhouxin comprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT hanlijuan comprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT chenyang comprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT qiuqinwei comprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing
AT lishuaicheng comprehensiveinvestigationofmetagenomeassemblybylinkedreadsequencing