Cargando…

Beta-diversity distance matrices for microbiome sample size and power calculations — How to obtain good estimates

In microbiome studies, researchers often wish to compare the taxa count distributions between groups of samples. Commonly-used corresponding methods of analysis are built on examining distance matrices, where distances describe the beta-diversity between samples. Analyses then compare the distributi...

Descripción completa

Detalles Bibliográficos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9133771/
https://www.ncbi.nlm.nih.gov/pubmed/35664226
http://dx.doi.org/10.1016/j.csbj.2022.04.032
Descripción
Sumario:In microbiome studies, researchers often wish to compare the taxa count distributions between groups of samples. Commonly-used corresponding methods of analysis are built on examining distance matrices, where distances describe the beta-diversity between samples. Analyses then compare the distribution of distances within groups to the distributions between groups. However, when performing a priori sample size or power calculations for such study designs, appropriate within and between group distance distributions can be challenging to obtain. When available, pilot study data, or data from prior studies of similar design should provide realistic distance estimates. However, when these are not available, distances can be extracted from available studies where one can assume similar beta-diversity. Alternatively, distances can be generated by simulation methods. Here, we describe and illustrate these three strategies for obtaining realistic distance matrices. For simulation methods, we illustrate the procedures required starting from existing benchmark data, as well as how to simulate directly from population assumptions. Using data from the American Gut project, we provide tables of observed distances for use by researchers planning their own studies, as well as R codes for generating similar matrices in other datasets. Furthermore, for simulated data, we compare methods, provide R codes, and demonstrate how challenging it is to obtain realistic distance distributions without any benchmark data. This code and illustrative distance tables are provided by the IMPACTT Consortium as a resource to the microbiome research community.