Cargando…

Theoretical and Simulation-Based Investigation of the Relationship between Sequencing Effort, Microbial Community Richness, and Diversity in Binning Metagenome-Assembled Genomes

We applied theoretical and simulation-based approaches to characterize how microbial community structure influences the amount of sequencing effort to reconstruct metagenomes that are assembled from short-read sequences. First, a coupon collector equation was proposed as an analytical model for pred...

Descripción completa

Detalles Bibliográficos
Autores principales: Royalty, Taylor M., Steen, Andrew D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749106/
https://www.ncbi.nlm.nih.gov/pubmed/31530648
http://dx.doi.org/10.1128/mSystems.00384-19
_version_ 1783452211271958528
author Royalty, Taylor M.
Steen, Andrew D.
author_facet Royalty, Taylor M.
Steen, Andrew D.
author_sort Royalty, Taylor M.
collection PubMed
description We applied theoretical and simulation-based approaches to characterize how microbial community structure influences the amount of sequencing effort to reconstruct metagenomes that are assembled from short-read sequences. First, a coupon collector equation was proposed as an analytical model for predicting sequencing effort as a function of microbial community structure. Characterization was performed by varying community structure properties such as richness, evenness, and genome size. Simulations demonstrated that while community richness and evenness influenced the sequencing effort required to sequence a community metagenome to exhaustion, the effort necessary to sequence an individual genome to a target fraction of exhaustion depended only on the relative abundance of the genome and its genome size. A second analysis evaluated the quantity, completion, and contamination of metagenome-assembled genomes (MAGs) as a function of sequencing effort on four preexisting sequence read data sets from different environments. These data sets were subsampled to various degrees of completeness to simulate the effect of sequencing effort on MAG retrieval. Modeling suggested that sequencing efforts beyond what is typical in published experiments (1 to 10 Gbp) would generate diminishing returns in terms of MAG binning. A software tool, Genome Relative Abundance to Sequencing Effort (GRASE), was created to assist investigators to further explore this relationship. Reevaluation of the relationship between sequencing effort and binning success in the context of genome relative abundance, as opposed to base pairs, provides a constraint on sequencing experiments based on the relative abundance of microbes in an environment rather than arbitrary levels of sequencing effort. IMPORTANCE Short-read sequencing with Illumina sequencing technology provides an accurate, high-throughput method for characterizing the metabolic potential of microbial communities. Short-read sequences can be assembled and binned into metagenome-assembled genomes, thus shedding light on the function of microbial ecosystems that are important for health, agriculture, and Earth system processes. The work presented here provides an analytical framework for selecting sequencing effort as a function of genome relative abundance. As such, experimental goals in metagenome-assembled genome creation projects can select sequencing effort based on the rarest target genome as a constrained threshold. We hope that the results presented here, as well as GRASE, will be valuable to researchers planning sequencing experiments. Author Video: An author video summary of this article is available.
format Online
Article
Text
id pubmed-6749106
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-67491062019-09-24 Theoretical and Simulation-Based Investigation of the Relationship between Sequencing Effort, Microbial Community Richness, and Diversity in Binning Metagenome-Assembled Genomes Royalty, Taylor M. Steen, Andrew D. mSystems Research Article We applied theoretical and simulation-based approaches to characterize how microbial community structure influences the amount of sequencing effort to reconstruct metagenomes that are assembled from short-read sequences. First, a coupon collector equation was proposed as an analytical model for predicting sequencing effort as a function of microbial community structure. Characterization was performed by varying community structure properties such as richness, evenness, and genome size. Simulations demonstrated that while community richness and evenness influenced the sequencing effort required to sequence a community metagenome to exhaustion, the effort necessary to sequence an individual genome to a target fraction of exhaustion depended only on the relative abundance of the genome and its genome size. A second analysis evaluated the quantity, completion, and contamination of metagenome-assembled genomes (MAGs) as a function of sequencing effort on four preexisting sequence read data sets from different environments. These data sets were subsampled to various degrees of completeness to simulate the effect of sequencing effort on MAG retrieval. Modeling suggested that sequencing efforts beyond what is typical in published experiments (1 to 10 Gbp) would generate diminishing returns in terms of MAG binning. A software tool, Genome Relative Abundance to Sequencing Effort (GRASE), was created to assist investigators to further explore this relationship. Reevaluation of the relationship between sequencing effort and binning success in the context of genome relative abundance, as opposed to base pairs, provides a constraint on sequencing experiments based on the relative abundance of microbes in an environment rather than arbitrary levels of sequencing effort. IMPORTANCE Short-read sequencing with Illumina sequencing technology provides an accurate, high-throughput method for characterizing the metabolic potential of microbial communities. Short-read sequences can be assembled and binned into metagenome-assembled genomes, thus shedding light on the function of microbial ecosystems that are important for health, agriculture, and Earth system processes. The work presented here provides an analytical framework for selecting sequencing effort as a function of genome relative abundance. As such, experimental goals in metagenome-assembled genome creation projects can select sequencing effort based on the rarest target genome as a constrained threshold. We hope that the results presented here, as well as GRASE, will be valuable to researchers planning sequencing experiments. Author Video: An author video summary of this article is available. American Society for Microbiology 2019-09-17 /pmc/articles/PMC6749106/ /pubmed/31530648 http://dx.doi.org/10.1128/mSystems.00384-19 Text en Copyright © 2019 Royalty and Steen. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Royalty, Taylor M.
Steen, Andrew D.
Theoretical and Simulation-Based Investigation of the Relationship between Sequencing Effort, Microbial Community Richness, and Diversity in Binning Metagenome-Assembled Genomes
title Theoretical and Simulation-Based Investigation of the Relationship between Sequencing Effort, Microbial Community Richness, and Diversity in Binning Metagenome-Assembled Genomes
title_full Theoretical and Simulation-Based Investigation of the Relationship between Sequencing Effort, Microbial Community Richness, and Diversity in Binning Metagenome-Assembled Genomes
title_fullStr Theoretical and Simulation-Based Investigation of the Relationship between Sequencing Effort, Microbial Community Richness, and Diversity in Binning Metagenome-Assembled Genomes
title_full_unstemmed Theoretical and Simulation-Based Investigation of the Relationship between Sequencing Effort, Microbial Community Richness, and Diversity in Binning Metagenome-Assembled Genomes
title_short Theoretical and Simulation-Based Investigation of the Relationship between Sequencing Effort, Microbial Community Richness, and Diversity in Binning Metagenome-Assembled Genomes
title_sort theoretical and simulation-based investigation of the relationship between sequencing effort, microbial community richness, and diversity in binning metagenome-assembled genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749106/
https://www.ncbi.nlm.nih.gov/pubmed/31530648
http://dx.doi.org/10.1128/mSystems.00384-19
work_keys_str_mv AT royaltytaylorm theoreticalandsimulationbasedinvestigationoftherelationshipbetweensequencingeffortmicrobialcommunityrichnessanddiversityinbinningmetagenomeassembledgenomes
AT steenandrewd theoreticalandsimulationbasedinvestigationoftherelationshipbetweensequencingeffortmicrobialcommunityrichnessanddiversityinbinningmetagenomeassembledgenomes