Cargando…

Effects of Rare Microbiome Taxa Filtering on Statistical Analysis

Background: The accuracy of microbial community detection in 16S rRNA marker-gene and metagenomic studies suffers from contamination and sequencing errors that lead to either falsely identifying microbial taxa that were not in the sample or misclassifying the taxa of DNA fragment reads. Removing con...

Descripción completa

Detalles Bibliográficos
Autores principales: Cao, Quy, Sun, Xinxin, Rajesh, Karun, Chalasani, Naga, Gelow, Kayla, Katz, Barry, Shah, Vijay H., Sanyal, Arun J., Smirnova, Ekaterina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7835481/
https://www.ncbi.nlm.nih.gov/pubmed/33510727
http://dx.doi.org/10.3389/fmicb.2020.607325
_version_ 1783642537877045248
author Cao, Quy
Sun, Xinxin
Rajesh, Karun
Chalasani, Naga
Gelow, Kayla
Katz, Barry
Shah, Vijay H.
Sanyal, Arun J.
Smirnova, Ekaterina
author_facet Cao, Quy
Sun, Xinxin
Rajesh, Karun
Chalasani, Naga
Gelow, Kayla
Katz, Barry
Shah, Vijay H.
Sanyal, Arun J.
Smirnova, Ekaterina
author_sort Cao, Quy
collection PubMed
description Background: The accuracy of microbial community detection in 16S rRNA marker-gene and metagenomic studies suffers from contamination and sequencing errors that lead to either falsely identifying microbial taxa that were not in the sample or misclassifying the taxa of DNA fragment reads. Removing contaminants and filtering rare features are two common approaches to deal with this problem. While contaminant detection methods use auxiliary sequencing process information to identify known contaminants, filtering methods remove taxa that are present in a small number of samples and have small counts in the samples where they are observed. The latter approach reduces the extreme sparsity of microbiome data and has been shown to correctly remove contaminant taxa in cultured “mock” datasets, where the true taxa compositions are known. Although filtering is frequently used, careful evaluation of its effect on the data analysis and scientific conclusions remains unreported. Here, we assess the effect of filtering on the alpha and beta diversity estimation as well as its impact on identifying taxa that discriminate between disease states. Results: The effect of filtering on microbiome data analysis is illustrated on four datasets: two mock quality control datasets where the same cultured samples with known microbial composition are processed at different labs and two disease study datasets. Results show that in microbiome quality control datasets, filtering reduces the magnitude of differences in alpha diversity and alleviates technical variability between labs while preserving the between samples similarity (beta diversity). In the disease study datasets, DESeq2 and linear discriminant analysis Effect Size (LEfSe) methods were used to identify taxa that are differentially abundant across groups of samples, and random forest models were used to rank features with the largest contribution toward disease classification. Results reveal that filtering retains significant taxa and preserves the model classification ability measured by the area under the receiver operating characteristic curve (AUC). The comparison between the filtering and the contaminant removal method shows that they have complementary effects and are advised to be used in conjunction. Conclusions: Filtering reduces the complexity of microbiome data while preserving their integrity in downstream analysis. This leads to mitigation of the classification methods' sensitivity and reduction of technical variability, allowing researchers to generate more reproducible and comparable results in microbiome data analysis.
format Online
Article
Text
id pubmed-7835481
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-78354812021-01-27 Effects of Rare Microbiome Taxa Filtering on Statistical Analysis Cao, Quy Sun, Xinxin Rajesh, Karun Chalasani, Naga Gelow, Kayla Katz, Barry Shah, Vijay H. Sanyal, Arun J. Smirnova, Ekaterina Front Microbiol Microbiology Background: The accuracy of microbial community detection in 16S rRNA marker-gene and metagenomic studies suffers from contamination and sequencing errors that lead to either falsely identifying microbial taxa that were not in the sample or misclassifying the taxa of DNA fragment reads. Removing contaminants and filtering rare features are two common approaches to deal with this problem. While contaminant detection methods use auxiliary sequencing process information to identify known contaminants, filtering methods remove taxa that are present in a small number of samples and have small counts in the samples where they are observed. The latter approach reduces the extreme sparsity of microbiome data and has been shown to correctly remove contaminant taxa in cultured “mock” datasets, where the true taxa compositions are known. Although filtering is frequently used, careful evaluation of its effect on the data analysis and scientific conclusions remains unreported. Here, we assess the effect of filtering on the alpha and beta diversity estimation as well as its impact on identifying taxa that discriminate between disease states. Results: The effect of filtering on microbiome data analysis is illustrated on four datasets: two mock quality control datasets where the same cultured samples with known microbial composition are processed at different labs and two disease study datasets. Results show that in microbiome quality control datasets, filtering reduces the magnitude of differences in alpha diversity and alleviates technical variability between labs while preserving the between samples similarity (beta diversity). In the disease study datasets, DESeq2 and linear discriminant analysis Effect Size (LEfSe) methods were used to identify taxa that are differentially abundant across groups of samples, and random forest models were used to rank features with the largest contribution toward disease classification. Results reveal that filtering retains significant taxa and preserves the model classification ability measured by the area under the receiver operating characteristic curve (AUC). The comparison between the filtering and the contaminant removal method shows that they have complementary effects and are advised to be used in conjunction. Conclusions: Filtering reduces the complexity of microbiome data while preserving their integrity in downstream analysis. This leads to mitigation of the classification methods' sensitivity and reduction of technical variability, allowing researchers to generate more reproducible and comparable results in microbiome data analysis. Frontiers Media S.A. 2021-01-12 /pmc/articles/PMC7835481/ /pubmed/33510727 http://dx.doi.org/10.3389/fmicb.2020.607325 Text en Copyright © 2021 Cao, Sun, Rajesh, Chalasani, Gelow, Katz, Shah, Sanyal and Smirnova. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Cao, Quy
Sun, Xinxin
Rajesh, Karun
Chalasani, Naga
Gelow, Kayla
Katz, Barry
Shah, Vijay H.
Sanyal, Arun J.
Smirnova, Ekaterina
Effects of Rare Microbiome Taxa Filtering on Statistical Analysis
title Effects of Rare Microbiome Taxa Filtering on Statistical Analysis
title_full Effects of Rare Microbiome Taxa Filtering on Statistical Analysis
title_fullStr Effects of Rare Microbiome Taxa Filtering on Statistical Analysis
title_full_unstemmed Effects of Rare Microbiome Taxa Filtering on Statistical Analysis
title_short Effects of Rare Microbiome Taxa Filtering on Statistical Analysis
title_sort effects of rare microbiome taxa filtering on statistical analysis
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7835481/
https://www.ncbi.nlm.nih.gov/pubmed/33510727
http://dx.doi.org/10.3389/fmicb.2020.607325
work_keys_str_mv AT caoquy effectsofraremicrobiometaxafilteringonstatisticalanalysis
AT sunxinxin effectsofraremicrobiometaxafilteringonstatisticalanalysis
AT rajeshkarun effectsofraremicrobiometaxafilteringonstatisticalanalysis
AT chalasaninaga effectsofraremicrobiometaxafilteringonstatisticalanalysis
AT gelowkayla effectsofraremicrobiometaxafilteringonstatisticalanalysis
AT katzbarry effectsofraremicrobiometaxafilteringonstatisticalanalysis
AT shahvijayh effectsofraremicrobiometaxafilteringonstatisticalanalysis
AT sanyalarunj effectsofraremicrobiometaxafilteringonstatisticalanalysis
AT smirnovaekaterina effectsofraremicrobiometaxafilteringonstatisticalanalysis