Cargando…

ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries

BACKGROUND: Identification of features is a critical task in microbiome studies that is complicated by the fact that microbial data are high dimensional and heterogeneous. Masked by the complexity of the data, the problem of separating signals (differential features between groups) from noise (featu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Liangliang, Shi, Yushu, Do, Kim-Anh, Peterson, Christine B., Jenq, Robert R.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7972227/ https://www.ncbi.nlm.nih.gov/pubmed/33731016 http://dx.doi.org/10.1186/s12859-021-04061-3

_version_	1783666683448131584
author	Zhang, Liangliang Shi, Yushu Do, Kim-Anh Peterson, Christine B. Jenq, Robert R.
author_facet	Zhang, Liangliang Shi, Yushu Do, Kim-Anh Peterson, Christine B. Jenq, Robert R.
author_sort	Zhang, Liangliang
collection	PubMed
description	BACKGROUND: Identification of features is a critical task in microbiome studies that is complicated by the fact that microbial data are high dimensional and heterogeneous. Masked by the complexity of the data, the problem of separating signals (differential features between groups) from noise (features that are not differential between groups) becomes challenging and troublesome. For instance, when performing differential abundance tests, multiple testing adjustments tend to be overconservative, as the probability of a type I error (false positive) increases dramatically with the large numbers of hypotheses. Moreover, the grouping effect of interest can be obscured by heterogeneity. These factors can incorrectly lead to the conclusion that there are no differences in the microbiome compositions. RESULTS: We translate and represent the problem of identifying differential features, which are differential in two-group comparisons (e.g., treatment versus control), as a dynamic layout of separating the signal from its random background. More specifically, we progressively permute the grouping factor labels of the microbiome samples and perform multiple differential abundance tests in each scenario. We then compare the signal strength of the most differential features from the original data with their performance in permutations, and will observe a visually apparent decreasing trend if these features are true positives identified from the data. Simulations and applications on real data show that the proposed method creates a U-curve when plotting the number of significant features versus the proportion of mixing. The shape of the U-Curve can convey the strength of the overall association between the microbiome and the grouping factor. We also define a fragility index to measure the robustness of the discoveries. Finally, we recommend the identified features by comparing p-values in the observed data with p-values in the fully mixed data. CONCLUSIONS: We have developed this into a user-friendly and efficient R-shiny tool with visualizations. By default, we use the Wilcoxon rank sum test to compute the p-values, since it is a robust nonparametric test. Our proposed method can also utilize p-values obtained from other testing methods, such as DESeq. This demonstrates the potential of the progressive permutation method to be extended to new settings. SUPPLEMENTARY INFORMATION: The online version supplementary material available at 10.1186/s12859-021-04061-3.
format	Online Article Text
id	pubmed-7972227
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-79722272021-03-19 ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries Zhang, Liangliang Shi, Yushu Do, Kim-Anh Peterson, Christine B. Jenq, Robert R. BMC Bioinformatics Methodology Article BACKGROUND: Identification of features is a critical task in microbiome studies that is complicated by the fact that microbial data are high dimensional and heterogeneous. Masked by the complexity of the data, the problem of separating signals (differential features between groups) from noise (features that are not differential between groups) becomes challenging and troublesome. For instance, when performing differential abundance tests, multiple testing adjustments tend to be overconservative, as the probability of a type I error (false positive) increases dramatically with the large numbers of hypotheses. Moreover, the grouping effect of interest can be obscured by heterogeneity. These factors can incorrectly lead to the conclusion that there are no differences in the microbiome compositions. RESULTS: We translate and represent the problem of identifying differential features, which are differential in two-group comparisons (e.g., treatment versus control), as a dynamic layout of separating the signal from its random background. More specifically, we progressively permute the grouping factor labels of the microbiome samples and perform multiple differential abundance tests in each scenario. We then compare the signal strength of the most differential features from the original data with their performance in permutations, and will observe a visually apparent decreasing trend if these features are true positives identified from the data. Simulations and applications on real data show that the proposed method creates a U-curve when plotting the number of significant features versus the proportion of mixing. The shape of the U-Curve can convey the strength of the overall association between the microbiome and the grouping factor. We also define a fragility index to measure the robustness of the discoveries. Finally, we recommend the identified features by comparing p-values in the observed data with p-values in the fully mixed data. CONCLUSIONS: We have developed this into a user-friendly and efficient R-shiny tool with visualizations. By default, we use the Wilcoxon rank sum test to compute the p-values, since it is a robust nonparametric test. Our proposed method can also utilize p-values obtained from other testing methods, such as DESeq. This demonstrates the potential of the progressive permutation method to be extended to new settings. SUPPLEMENTARY INFORMATION: The online version supplementary material available at 10.1186/s12859-021-04061-3. BioMed Central 2021-03-17 /pmc/articles/PMC7972227/ /pubmed/33731016 http://dx.doi.org/10.1186/s12859-021-04061-3 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Article Zhang, Liangliang Shi, Yushu Do, Kim-Anh Peterson, Christine B. Jenq, Robert R. ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries
title	ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries
title_full	ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries
title_fullStr	ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries
title_full_unstemmed	ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries
title_short	ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries
title_sort	progperm: progressive permutation for a dynamic representation of the robustness of microbiome discoveries
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7972227/ https://www.ncbi.nlm.nih.gov/pubmed/33731016 http://dx.doi.org/10.1186/s12859-021-04061-3
work_keys_str_mv	AT zhangliangliang progpermprogressivepermutationforadynamicrepresentationoftherobustnessofmicrobiomediscoveries AT shiyushu progpermprogressivepermutationforadynamicrepresentationoftherobustnessofmicrobiomediscoveries AT dokimanh progpermprogressivepermutationforadynamicrepresentationoftherobustnessofmicrobiomediscoveries AT petersonchristineb progpermprogressivepermutationforadynamicrepresentationoftherobustnessofmicrobiomediscoveries AT jenqrobertr progpermprogressivepermutationforadynamicrepresentationoftherobustnessofmicrobiomediscoveries

ProgPerm: Progressive permutation for a dynamic representation of the robustness of microbiome discoveries

Ejemplares similares