Cargando…

Evaluating replicability in microbiome data

High-throughput sequencing is widely used to study microbial communities. However, choice of laboratory protocol is known to affect the resulting microbiome data, which has an unquantified impact on many comparisons between communities of scientific interest. We propose a novel approach to evaluatin...

Descripción completa

Detalles Bibliográficos
Autores principales: Clausen, David S, Willis, Amy D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9566336/
https://www.ncbi.nlm.nih.gov/pubmed/34969071
http://dx.doi.org/10.1093/biostatistics/kxab048
_version_ 1784809126416613376
author Clausen, David S
Willis, Amy D
author_facet Clausen, David S
Willis, Amy D
author_sort Clausen, David S
collection PubMed
description High-throughput sequencing is widely used to study microbial communities. However, choice of laboratory protocol is known to affect the resulting microbiome data, which has an unquantified impact on many comparisons between communities of scientific interest. We propose a novel approach to evaluating replicability in high-dimensional data and apply it to assess the cross-laboratory replicability of signals in microbiome data using the Microbiome Quality Control Project data set. We learn distinctions between samples as measured by a single laboratory and evaluate whether the same distinctions hold in data produced by other laboratories. While most sequencing laboratories can consistently distinguish between samples (median correct classification 87% on genus-level proportion data), these distinctions frequently fail to hold in data from other laboratories (median correct classification 55% across laboratory on genus-level proportion data). As identical samples processed by different laboratories generate substantively different quantitative results, we conclude that 16S sequencing does not reliably resolve differences in human microbiome samples. However, because we observe greater replicability under certain data transformations, our results inform the analysis of microbiome data.
format Online
Article
Text
id pubmed-9566336
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-95663362022-10-19 Evaluating replicability in microbiome data Clausen, David S Willis, Amy D Biostatistics Articles High-throughput sequencing is widely used to study microbial communities. However, choice of laboratory protocol is known to affect the resulting microbiome data, which has an unquantified impact on many comparisons between communities of scientific interest. We propose a novel approach to evaluating replicability in high-dimensional data and apply it to assess the cross-laboratory replicability of signals in microbiome data using the Microbiome Quality Control Project data set. We learn distinctions between samples as measured by a single laboratory and evaluate whether the same distinctions hold in data produced by other laboratories. While most sequencing laboratories can consistently distinguish between samples (median correct classification 87% on genus-level proportion data), these distinctions frequently fail to hold in data from other laboratories (median correct classification 55% across laboratory on genus-level proportion data). As identical samples processed by different laboratories generate substantively different quantitative results, we conclude that 16S sequencing does not reliably resolve differences in human microbiome samples. However, because we observe greater replicability under certain data transformations, our results inform the analysis of microbiome data. Oxford University Press 2021-12-30 /pmc/articles/PMC9566336/ /pubmed/34969071 http://dx.doi.org/10.1093/biostatistics/kxab048 Text en © The Author 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Articles
Clausen, David S
Willis, Amy D
Evaluating replicability in microbiome data
title Evaluating replicability in microbiome data
title_full Evaluating replicability in microbiome data
title_fullStr Evaluating replicability in microbiome data
title_full_unstemmed Evaluating replicability in microbiome data
title_short Evaluating replicability in microbiome data
title_sort evaluating replicability in microbiome data
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9566336/
https://www.ncbi.nlm.nih.gov/pubmed/34969071
http://dx.doi.org/10.1093/biostatistics/kxab048
work_keys_str_mv AT clausendavids evaluatingreplicabilityinmicrobiomedata
AT willisamyd evaluatingreplicabilityinmicrobiomedata