Cargando…
Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities
Amplicon sequencing has revolutionized our ability to study DNA collected from environmental samples by providing a rapid and sensitive technique for microbial community analysis that eliminates the challenges associated with lab cultivation and taxonomic identification through microscopy. In water...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8595385/ https://www.ncbi.nlm.nih.gov/pubmed/34785722 http://dx.doi.org/10.1038/s41598-021-01636-1 |
_version_ | 1784600191529123840 |
---|---|
author | Cameron, Ellen S. Schmidt, Philip J. Tremblay, Benjamin J.-M. Emelko, Monica B. Müller, Kirsten M. |
author_facet | Cameron, Ellen S. Schmidt, Philip J. Tremblay, Benjamin J.-M. Emelko, Monica B. Müller, Kirsten M. |
author_sort | Cameron, Ellen S. |
collection | PubMed |
description | Amplicon sequencing has revolutionized our ability to study DNA collected from environmental samples by providing a rapid and sensitive technique for microbial community analysis that eliminates the challenges associated with lab cultivation and taxonomic identification through microscopy. In water resources management, it can be especially useful to evaluate ecosystem shifts in response to natural and anthropogenic landscape disturbances to signal potential water quality concerns, such as the detection of toxic cyanobacteria or pathogenic bacteria. Amplicon sequencing data consist of discrete counts of sequence reads, the sum of which is the library size. Groups of samples typically have different library sizes that are not representative of biological variation; library size normalization is required to meaningfully compare diversity between them. Rarefaction is a widely used normalization technique that involves the random subsampling of sequences from the initial sample library to a selected normalized library size. This process is often dismissed as statistically invalid because subsampling effectively discards a portion of the observed sequences, yet it remains prevalent in practice and the suitability of rarefying, relative to many other normalization approaches, for diversity analysis has been argued. Here, repeated rarefying is proposed as a tool to normalize library sizes for diversity analyses. This enables (i) proportionate representation of all observed sequences and (ii) characterization of the random variation introduced to diversity analyses by rarefying to a smaller library size shared by all samples. While many deterministic data transformations are not tailored to produce equal library sizes, repeatedly rarefying reflects the probabilistic process by which amplicon sequencing data are obtained as a representation of the amplified source microbial community. Specifically, it evaluates which data might have been obtained if a particular sample’s library size had been smaller and allows graphical representation of the effects of this library size normalization process upon diversity analysis results. |
format | Online Article Text |
id | pubmed-8595385 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-85953852021-11-17 Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities Cameron, Ellen S. Schmidt, Philip J. Tremblay, Benjamin J.-M. Emelko, Monica B. Müller, Kirsten M. Sci Rep Article Amplicon sequencing has revolutionized our ability to study DNA collected from environmental samples by providing a rapid and sensitive technique for microbial community analysis that eliminates the challenges associated with lab cultivation and taxonomic identification through microscopy. In water resources management, it can be especially useful to evaluate ecosystem shifts in response to natural and anthropogenic landscape disturbances to signal potential water quality concerns, such as the detection of toxic cyanobacteria or pathogenic bacteria. Amplicon sequencing data consist of discrete counts of sequence reads, the sum of which is the library size. Groups of samples typically have different library sizes that are not representative of biological variation; library size normalization is required to meaningfully compare diversity between them. Rarefaction is a widely used normalization technique that involves the random subsampling of sequences from the initial sample library to a selected normalized library size. This process is often dismissed as statistically invalid because subsampling effectively discards a portion of the observed sequences, yet it remains prevalent in practice and the suitability of rarefying, relative to many other normalization approaches, for diversity analysis has been argued. Here, repeated rarefying is proposed as a tool to normalize library sizes for diversity analyses. This enables (i) proportionate representation of all observed sequences and (ii) characterization of the random variation introduced to diversity analyses by rarefying to a smaller library size shared by all samples. While many deterministic data transformations are not tailored to produce equal library sizes, repeatedly rarefying reflects the probabilistic process by which amplicon sequencing data are obtained as a representation of the amplified source microbial community. Specifically, it evaluates which data might have been obtained if a particular sample’s library size had been smaller and allows graphical representation of the effects of this library size normalization process upon diversity analysis results. Nature Publishing Group UK 2021-11-16 /pmc/articles/PMC8595385/ /pubmed/34785722 http://dx.doi.org/10.1038/s41598-021-01636-1 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Cameron, Ellen S. Schmidt, Philip J. Tremblay, Benjamin J.-M. Emelko, Monica B. Müller, Kirsten M. Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities |
title | Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities |
title_full | Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities |
title_fullStr | Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities |
title_full_unstemmed | Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities |
title_short | Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities |
title_sort | enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8595385/ https://www.ncbi.nlm.nih.gov/pubmed/34785722 http://dx.doi.org/10.1038/s41598-021-01636-1 |
work_keys_str_mv | AT cameronellens enhancingdiversityanalysisbyrepeatedlyrarefyingnextgenerationsequencingdatadescribingmicrobialcommunities AT schmidtphilipj enhancingdiversityanalysisbyrepeatedlyrarefyingnextgenerationsequencingdatadescribingmicrobialcommunities AT tremblaybenjaminjm enhancingdiversityanalysisbyrepeatedlyrarefyingnextgenerationsequencingdatadescribingmicrobialcommunities AT emelkomonicab enhancingdiversityanalysisbyrepeatedlyrarefyingnextgenerationsequencingdatadescribingmicrobialcommunities AT mullerkirstenm enhancingdiversityanalysisbyrepeatedlyrarefyingnextgenerationsequencingdatadescribingmicrobialcommunities |