Cargando…

Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering

In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, wh...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ullmann, Theresa, Peschel, Stefanie, Finger, Philipp, Müller, Christian L., Boulesteix, Anne-Laure
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2023
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9873197/ https://www.ncbi.nlm.nih.gov/pubmed/36608142 http://dx.doi.org/10.1371/journal.pcbi.1010820

_version_	1784877551132344320
author	Ullmann, Theresa Peschel, Stefanie Finger, Philipp Müller, Christian L. Boulesteix, Anne-Laure
author_facet	Ullmann, Theresa Peschel, Stefanie Finger, Philipp Müller, Christian L. Boulesteix, Anne-Laure
author_sort	Ullmann, Theresa
collection	PubMed
description	In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the “best” ones. However, if only the best results are selectively reported, this may cause over-optimism: the “best” method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the “best” method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance.
format	Online Article Text
id	pubmed-9873197
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-98731972023-01-25 Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering Ullmann, Theresa Peschel, Stefanie Finger, Philipp Müller, Christian L. Boulesteix, Anne-Laure PLoS Comput Biol Research Article In recent years, unsupervised analysis of microbiome data, such as microbial network analysis and clustering, has increased in popularity. Many new statistical and computational methods have been proposed for these tasks. This multiplicity of analysis strategies poses a challenge for researchers, who are often unsure which method(s) to use and might be tempted to try different methods on their dataset to look for the “best” ones. However, if only the best results are selectively reported, this may cause over-optimism: the “best” method is overly fitted to the specific dataset, and the results might be non-replicable on validation data. Such effects will ultimately hinder research progress. Yet so far, these topics have been given little attention in the context of unsupervised microbiome analysis. In our illustrative study, we aim to quantify over-optimism effects in this context. We model the approach of a hypothetical microbiome researcher who undertakes four unsupervised research tasks: clustering of bacterial genera, hub detection in microbial networks, differential microbial network analysis, and clustering of samples. While these tasks are unsupervised, the researcher might still have certain expectations as to what constitutes interesting results. We translate these expectations into concrete evaluation criteria that the hypothetical researcher might want to optimize. We then randomly split an exemplary dataset from the American Gut Project into discovery and validation sets multiple times. For each research task, multiple method combinations (e.g., methods for data normalization, network generation, and/or clustering) are tried on the discovery data, and the combination that yields the best result according to the evaluation criterion is chosen. While the hypothetical researcher might only report this result, we also apply the “best” method combination to the validation dataset. The results are then compared between discovery and validation data. In all four research tasks, there are notable over-optimism effects; the results on the validation data set are worse compared to the discovery data, averaged over multiple random splits into discovery/validation data. Our study thus highlights the importance of validation and replication in microbiome analysis to obtain reliable results and demonstrates that the issue of over-optimism goes beyond the context of statistical testing and fishing for significance. Public Library of Science 2023-01-06 /pmc/articles/PMC9873197/ /pubmed/36608142 http://dx.doi.org/10.1371/journal.pcbi.1010820 Text en © 2023 Ullmann et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Ullmann, Theresa Peschel, Stefanie Finger, Philipp Müller, Christian L. Boulesteix, Anne-Laure Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering
title	Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering
title_full	Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering
title_fullStr	Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering
title_full_unstemmed	Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering
title_short	Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering
title_sort	over-optimism in unsupervised microbiome analysis: insights from network learning and clustering
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9873197/ https://www.ncbi.nlm.nih.gov/pubmed/36608142 http://dx.doi.org/10.1371/journal.pcbi.1010820
work_keys_str_mv	AT ullmanntheresa overoptimisminunsupervisedmicrobiomeanalysisinsightsfromnetworklearningandclustering AT peschelstefanie overoptimisminunsupervisedmicrobiomeanalysisinsightsfromnetworklearningandclustering AT fingerphilipp overoptimisminunsupervisedmicrobiomeanalysisinsightsfromnetworklearningandclustering AT mullerchristianl overoptimisminunsupervisedmicrobiomeanalysisinsightsfromnetworklearningandclustering AT boulesteixannelaure overoptimisminunsupervisedmicrobiomeanalysisinsightsfromnetworklearningandclustering

Over-optimism in unsupervised microbiome analysis: Insights from network learning and clustering

Ejemplares similares