Cargando…

Evaluating de Novo Assembly and Binning Strategies for Time Series Drinking Water Metagenomes

Reconstructing microbial genomes from metagenomic short-read data can be challenging due to the unknown and uneven complexity of microbial communities. This complexity encompasses highly diverse populations, which often includes strain variants. Reconstructing high-quality genomes is a crucial part...

Descripción completa

Detalles Bibliográficos
Autores principales: Vosloo, Solize, Huo, Linxuan, Anderson, Christopher L., Dai, Zihan, Sevillano, Maria, Pinto, Ameet
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8567270/
https://www.ncbi.nlm.nih.gov/pubmed/34730411
http://dx.doi.org/10.1128/Spectrum.01434-21
_version_ 1784594197917990912
author Vosloo, Solize
Huo, Linxuan
Anderson, Christopher L.
Dai, Zihan
Sevillano, Maria
Pinto, Ameet
author_facet Vosloo, Solize
Huo, Linxuan
Anderson, Christopher L.
Dai, Zihan
Sevillano, Maria
Pinto, Ameet
author_sort Vosloo, Solize
collection PubMed
description Reconstructing microbial genomes from metagenomic short-read data can be challenging due to the unknown and uneven complexity of microbial communities. This complexity encompasses highly diverse populations, which often includes strain variants. Reconstructing high-quality genomes is a crucial part of the metagenomic workflow, as subsequent ecological and metabolic inferences depend on their accuracy, quality, and completeness. In contrast to microbial communities in other ecosystems, there has been no systematic assessment of genome-centric metagenomic workflows for drinking water microbiomes. In this study, we assessed the performance of a combination of assembly and binning strategies for time series drinking water metagenomes that were collected over 6 months. The goal of this study was to identify the combination of assembly and binning approaches that result in high-quality and -quantity metagenome-assembled genomes (MAGs), representing most of the sequenced metagenome. Our findings suggest that the metaSPAdes coassembly strategies had the best performance, as they resulted in larger and less fragmented assemblies, with at least 85% of the sequence data mapping to contigs greater than 1 kbp. Furthermore, a combination of metaSPAdes coassembly strategies and MetaBAT2 produced the highest number of medium-quality MAGs while capturing at least 70% of the metagenomes based on read recruitment. Utilizing different assembly/binning approaches also assists in the reconstruction of unique MAGs from closely related species that would have otherwise collapsed into a single MAG using a single workflow. Overall, our study suggests that leveraging multiple binning approaches with different metaSPAdes coassembly strategies may be required to maximize the recovery of good-quality MAGs. IMPORTANCE Drinking water contains phylogenetic diverse groups of bacteria, archaea, and eukarya that affect the esthetic quality of water, water infrastructure, and public health. Taxonomic, metabolic, and ecological inferences of the drinking water microbiome depend on the accuracy, quality, and completeness of genomes that are reconstructed through the application of genome-resolved metagenomics. Using time series metagenomic data, we present reproducible genome-centric metagenomic workflows that result in high-quality and -quantity genomes, which more accurately signifies the sequenced drinking water microbiome. These genome-centric metagenomic workflows will allow for improved taxonomic and functional potential analysis that offers enhanced insights into the stability and dynamics of drinking water microbial communities.
format Online
Article
Text
id pubmed-8567270
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-85672702021-11-08 Evaluating de Novo Assembly and Binning Strategies for Time Series Drinking Water Metagenomes Vosloo, Solize Huo, Linxuan Anderson, Christopher L. Dai, Zihan Sevillano, Maria Pinto, Ameet Microbiol Spectr Research Article Reconstructing microbial genomes from metagenomic short-read data can be challenging due to the unknown and uneven complexity of microbial communities. This complexity encompasses highly diverse populations, which often includes strain variants. Reconstructing high-quality genomes is a crucial part of the metagenomic workflow, as subsequent ecological and metabolic inferences depend on their accuracy, quality, and completeness. In contrast to microbial communities in other ecosystems, there has been no systematic assessment of genome-centric metagenomic workflows for drinking water microbiomes. In this study, we assessed the performance of a combination of assembly and binning strategies for time series drinking water metagenomes that were collected over 6 months. The goal of this study was to identify the combination of assembly and binning approaches that result in high-quality and -quantity metagenome-assembled genomes (MAGs), representing most of the sequenced metagenome. Our findings suggest that the metaSPAdes coassembly strategies had the best performance, as they resulted in larger and less fragmented assemblies, with at least 85% of the sequence data mapping to contigs greater than 1 kbp. Furthermore, a combination of metaSPAdes coassembly strategies and MetaBAT2 produced the highest number of medium-quality MAGs while capturing at least 70% of the metagenomes based on read recruitment. Utilizing different assembly/binning approaches also assists in the reconstruction of unique MAGs from closely related species that would have otherwise collapsed into a single MAG using a single workflow. Overall, our study suggests that leveraging multiple binning approaches with different metaSPAdes coassembly strategies may be required to maximize the recovery of good-quality MAGs. IMPORTANCE Drinking water contains phylogenetic diverse groups of bacteria, archaea, and eukarya that affect the esthetic quality of water, water infrastructure, and public health. Taxonomic, metabolic, and ecological inferences of the drinking water microbiome depend on the accuracy, quality, and completeness of genomes that are reconstructed through the application of genome-resolved metagenomics. Using time series metagenomic data, we present reproducible genome-centric metagenomic workflows that result in high-quality and -quantity genomes, which more accurately signifies the sequenced drinking water microbiome. These genome-centric metagenomic workflows will allow for improved taxonomic and functional potential analysis that offers enhanced insights into the stability and dynamics of drinking water microbial communities. American Society for Microbiology 2021-11-03 /pmc/articles/PMC8567270/ /pubmed/34730411 http://dx.doi.org/10.1128/Spectrum.01434-21 Text en Copyright © 2021 Vosloo et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Vosloo, Solize
Huo, Linxuan
Anderson, Christopher L.
Dai, Zihan
Sevillano, Maria
Pinto, Ameet
Evaluating de Novo Assembly and Binning Strategies for Time Series Drinking Water Metagenomes
title Evaluating de Novo Assembly and Binning Strategies for Time Series Drinking Water Metagenomes
title_full Evaluating de Novo Assembly and Binning Strategies for Time Series Drinking Water Metagenomes
title_fullStr Evaluating de Novo Assembly and Binning Strategies for Time Series Drinking Water Metagenomes
title_full_unstemmed Evaluating de Novo Assembly and Binning Strategies for Time Series Drinking Water Metagenomes
title_short Evaluating de Novo Assembly and Binning Strategies for Time Series Drinking Water Metagenomes
title_sort evaluating de novo assembly and binning strategies for time series drinking water metagenomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8567270/
https://www.ncbi.nlm.nih.gov/pubmed/34730411
http://dx.doi.org/10.1128/Spectrum.01434-21
work_keys_str_mv AT vosloosolize evaluatingdenovoassemblyandbinningstrategiesfortimeseriesdrinkingwatermetagenomes
AT huolinxuan evaluatingdenovoassemblyandbinningstrategiesfortimeseriesdrinkingwatermetagenomes
AT andersonchristopherl evaluatingdenovoassemblyandbinningstrategiesfortimeseriesdrinkingwatermetagenomes
AT daizihan evaluatingdenovoassemblyandbinningstrategiesfortimeseriesdrinkingwatermetagenomes
AT sevillanomaria evaluatingdenovoassemblyandbinningstrategiesfortimeseriesdrinkingwatermetagenomes
AT pintoameet evaluatingdenovoassemblyandbinningstrategiesfortimeseriesdrinkingwatermetagenomes