Cargando…
Deduplication Improves Cost-Efficiency and Yields of De Novo Assembly and Binning of Shotgun Metagenomes in Microbiome Research
In the last decade, metagenomics has greatly revolutionized the study of microbial communities. However, the presence of artificial duplicate reads raised mainly from the preparation of metagenomic DNA sequencing libraries and their impacts on metagenomic assembly and binning have never been brought...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Society for Microbiology
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10101064/ https://www.ncbi.nlm.nih.gov/pubmed/36744896 http://dx.doi.org/10.1128/spectrum.04282-22 |
_version_ | 1785025425996513280 |
---|---|
author | Zhang, Zhiguo Zhang, Lu Zhang, Guoqing Zhao, Ze Wang, Hui Ju, Feng |
author_facet | Zhang, Zhiguo Zhang, Lu Zhang, Guoqing Zhao, Ze Wang, Hui Ju, Feng |
author_sort | Zhang, Zhiguo |
collection | PubMed |
description | In the last decade, metagenomics has greatly revolutionized the study of microbial communities. However, the presence of artificial duplicate reads raised mainly from the preparation of metagenomic DNA sequencing libraries and their impacts on metagenomic assembly and binning have never been brought to attention. Here, we explicitly investigated the effects of duplicate reads on metagenomic assemblies and binning based on analyses of five groups of representative metagenomes with distinct microbiome complexities. Our results showed that deduplication considerably increased the binning yields (by 3.5% to 80%) for most of the metagenomic data sets examined thanks to the improved contig length and coverage profiling of metagenome-assembled contigs, whereas it slightly decreased the binning yields of metagenomes with low complexity (e.g., human gut metagenomes). Specifically, 411 versus 397, 331 versus 317, 104 versus 88, and 9 versus 5 metagenome-assembled genomes (MAGs) were recovered from MEGAHIT assemblies of bioreactor sludge, surface water, lake sediment, and forest soil metagenomes, respectively. Noticeably, deduplication significantly reduced the computational costs of the metagenomic assembly, including the elapsed time (9.0% to 29.9%) and the maximum memory requirement (4.3% to 37.1%). Collectively, we recommend the removal of duplicate reads in metagenomes with high complexity before assembly and binning analyses, for example, the forest soil metagenomes examined in this study. IMPORTANCE Duplicated reads in shotgun metagenomes are usually considered technical artifacts. Their presence in metagenomes would theoretically not only introduce bias into the quantitative analysis but also result in mistakes in the coverage profile, leading to adverse effects on or even failures in metagenomic assembly and binning, as the widely used metagenome assemblers and binners all need coverage information for graph partitioning and assembly binning, respectively. However, this issue was seldom noticed, and its impacts on downstream essential bioinformatic procedures (e.g., assembly and binning) remained unclear. In this study, we comprehensively evaluated for the first time the implications of duplicate reads for the de novo assembly and binning of real metagenomic data sets by comparing the assembly qualities, binning yields, and requirements for computational resources with and without the removal of duplicate reads. It was revealed that deduplication considerably increased the binning yields of metagenomes with high complexity and significantly reduced the computational costs, including the elapsed time and the maximum memory requirement, for most of the metagenomes studied. These results provide empirical references for more cost-efficient metagenomic analyses in microbiome research. |
format | Online Article Text |
id | pubmed-10101064 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Society for Microbiology |
record_format | MEDLINE/PubMed |
spelling | pubmed-101010642023-04-14 Deduplication Improves Cost-Efficiency and Yields of De Novo Assembly and Binning of Shotgun Metagenomes in Microbiome Research Zhang, Zhiguo Zhang, Lu Zhang, Guoqing Zhao, Ze Wang, Hui Ju, Feng Microbiol Spectr Research Article In the last decade, metagenomics has greatly revolutionized the study of microbial communities. However, the presence of artificial duplicate reads raised mainly from the preparation of metagenomic DNA sequencing libraries and their impacts on metagenomic assembly and binning have never been brought to attention. Here, we explicitly investigated the effects of duplicate reads on metagenomic assemblies and binning based on analyses of five groups of representative metagenomes with distinct microbiome complexities. Our results showed that deduplication considerably increased the binning yields (by 3.5% to 80%) for most of the metagenomic data sets examined thanks to the improved contig length and coverage profiling of metagenome-assembled contigs, whereas it slightly decreased the binning yields of metagenomes with low complexity (e.g., human gut metagenomes). Specifically, 411 versus 397, 331 versus 317, 104 versus 88, and 9 versus 5 metagenome-assembled genomes (MAGs) were recovered from MEGAHIT assemblies of bioreactor sludge, surface water, lake sediment, and forest soil metagenomes, respectively. Noticeably, deduplication significantly reduced the computational costs of the metagenomic assembly, including the elapsed time (9.0% to 29.9%) and the maximum memory requirement (4.3% to 37.1%). Collectively, we recommend the removal of duplicate reads in metagenomes with high complexity before assembly and binning analyses, for example, the forest soil metagenomes examined in this study. IMPORTANCE Duplicated reads in shotgun metagenomes are usually considered technical artifacts. Their presence in metagenomes would theoretically not only introduce bias into the quantitative analysis but also result in mistakes in the coverage profile, leading to adverse effects on or even failures in metagenomic assembly and binning, as the widely used metagenome assemblers and binners all need coverage information for graph partitioning and assembly binning, respectively. However, this issue was seldom noticed, and its impacts on downstream essential bioinformatic procedures (e.g., assembly and binning) remained unclear. In this study, we comprehensively evaluated for the first time the implications of duplicate reads for the de novo assembly and binning of real metagenomic data sets by comparing the assembly qualities, binning yields, and requirements for computational resources with and without the removal of duplicate reads. It was revealed that deduplication considerably increased the binning yields of metagenomes with high complexity and significantly reduced the computational costs, including the elapsed time and the maximum memory requirement, for most of the metagenomes studied. These results provide empirical references for more cost-efficient metagenomic analyses in microbiome research. American Society for Microbiology 2023-02-06 /pmc/articles/PMC10101064/ /pubmed/36744896 http://dx.doi.org/10.1128/spectrum.04282-22 Text en Copyright © 2023 Zhang et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Research Article Zhang, Zhiguo Zhang, Lu Zhang, Guoqing Zhao, Ze Wang, Hui Ju, Feng Deduplication Improves Cost-Efficiency and Yields of De Novo Assembly and Binning of Shotgun Metagenomes in Microbiome Research |
title | Deduplication Improves Cost-Efficiency and Yields of De Novo Assembly and Binning of Shotgun Metagenomes in Microbiome Research |
title_full | Deduplication Improves Cost-Efficiency and Yields of De Novo Assembly and Binning of Shotgun Metagenomes in Microbiome Research |
title_fullStr | Deduplication Improves Cost-Efficiency and Yields of De Novo Assembly and Binning of Shotgun Metagenomes in Microbiome Research |
title_full_unstemmed | Deduplication Improves Cost-Efficiency and Yields of De Novo Assembly and Binning of Shotgun Metagenomes in Microbiome Research |
title_short | Deduplication Improves Cost-Efficiency and Yields of De Novo Assembly and Binning of Shotgun Metagenomes in Microbiome Research |
title_sort | deduplication improves cost-efficiency and yields of de novo assembly and binning of shotgun metagenomes in microbiome research |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10101064/ https://www.ncbi.nlm.nih.gov/pubmed/36744896 http://dx.doi.org/10.1128/spectrum.04282-22 |
work_keys_str_mv | AT zhangzhiguo deduplicationimprovescostefficiencyandyieldsofdenovoassemblyandbinningofshotgunmetagenomesinmicrobiomeresearch AT zhanglu deduplicationimprovescostefficiencyandyieldsofdenovoassemblyandbinningofshotgunmetagenomesinmicrobiomeresearch AT zhangguoqing deduplicationimprovescostefficiencyandyieldsofdenovoassemblyandbinningofshotgunmetagenomesinmicrobiomeresearch AT zhaoze deduplicationimprovescostefficiencyandyieldsofdenovoassemblyandbinningofshotgunmetagenomesinmicrobiomeresearch AT wanghui deduplicationimprovescostefficiencyandyieldsofdenovoassemblyandbinningofshotgunmetagenomesinmicrobiomeresearch AT jufeng deduplicationimprovescostefficiencyandyieldsofdenovoassemblyandbinningofshotgunmetagenomesinmicrobiomeresearch |