Cargando…

Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets

Many researchers try to understand a biological condition by identifying biomarkers. This is typically done using univariate hypothesis testing over a labeled dataset, declaring a feature to be a biomarker if there is a significant statistical difference between its values for the subjects with diff...

Descripción completa

Detalles Bibliográficos
Autores principales:	Forouzandeh, Amir, Rutar, Alex, Kalmady, Sunil V., Greiner, Russell
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9333302/ https://www.ncbi.nlm.nih.gov/pubmed/35901020 http://dx.doi.org/10.1371/journal.pone.0252697

_version_	1784758844486844416
author	Forouzandeh, Amir Rutar, Alex Kalmady, Sunil V. Greiner, Russell
author_facet	Forouzandeh, Amir Rutar, Alex Kalmady, Sunil V. Greiner, Russell
author_sort	Forouzandeh, Amir
collection	PubMed
description	Many researchers try to understand a biological condition by identifying biomarkers. This is typically done using univariate hypothesis testing over a labeled dataset, declaring a feature to be a biomarker if there is a significant statistical difference between its values for the subjects with different outcomes. However, such sets of proposed biomarkers are often not reproducible – subsequent studies often fail to identify the same sets. Indeed, there is often only a very small overlap between the biomarkers proposed in pairs of related studies that explore the same phenotypes over the same distribution of subjects. This paper first defines the Reproducibility Score for a labeled dataset as a measure (taking values between 0 and 1) of the reproducibility of the results produced by a specified fixed biomarker discovery process for a given distribution of subjects. We then provide ways to reliably estimate this score by defining algorithms that produce an over-bound and an under-bound for this score for a given dataset and biomarker discovery process, for the case of univariate hypothesis testing on dichotomous groups. We confirm that these approximations are meaningful by providing empirical results on a large number of datasets and show that these predictions match known reproducibility results. To encourage others to apply this technique to analyze their biomarker sets, we have also created a publicly available website, https://biomarker.shinyapps.io/BiomarkerReprod/, that produces these Reproducibility Score approximations for any given dataset (with continuous or discrete features and binary class labels).
format	Online Article Text
id	pubmed-9333302
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-93333022022-07-29 Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets Forouzandeh, Amir Rutar, Alex Kalmady, Sunil V. Greiner, Russell PLoS One Research Article Many researchers try to understand a biological condition by identifying biomarkers. This is typically done using univariate hypothesis testing over a labeled dataset, declaring a feature to be a biomarker if there is a significant statistical difference between its values for the subjects with different outcomes. However, such sets of proposed biomarkers are often not reproducible – subsequent studies often fail to identify the same sets. Indeed, there is often only a very small overlap between the biomarkers proposed in pairs of related studies that explore the same phenotypes over the same distribution of subjects. This paper first defines the Reproducibility Score for a labeled dataset as a measure (taking values between 0 and 1) of the reproducibility of the results produced by a specified fixed biomarker discovery process for a given distribution of subjects. We then provide ways to reliably estimate this score by defining algorithms that produce an over-bound and an under-bound for this score for a given dataset and biomarker discovery process, for the case of univariate hypothesis testing on dichotomous groups. We confirm that these approximations are meaningful by providing empirical results on a large number of datasets and show that these predictions match known reproducibility results. To encourage others to apply this technique to analyze their biomarker sets, we have also created a publicly available website, https://biomarker.shinyapps.io/BiomarkerReprod/, that produces these Reproducibility Score approximations for any given dataset (with continuous or discrete features and binary class labels). Public Library of Science 2022-07-28 /pmc/articles/PMC9333302/ /pubmed/35901020 http://dx.doi.org/10.1371/journal.pone.0252697 Text en © 2022 Forouzandeh et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Forouzandeh, Amir Rutar, Alex Kalmady, Sunil V. Greiner, Russell Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets
title	Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets
title_full	Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets
title_fullStr	Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets
title_full_unstemmed	Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets
title_short	Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets
title_sort	analyzing biomarker discovery: estimating the reproducibility of biomarker sets
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9333302/ https://www.ncbi.nlm.nih.gov/pubmed/35901020 http://dx.doi.org/10.1371/journal.pone.0252697
work_keys_str_mv	AT forouzandehamir analyzingbiomarkerdiscoveryestimatingthereproducibilityofbiomarkersets AT rutaralex analyzingbiomarkerdiscoveryestimatingthereproducibilityofbiomarkersets AT kalmadysunilv analyzingbiomarkerdiscoveryestimatingthereproducibilityofbiomarkersets AT greinerrussell analyzingbiomarkerdiscoveryestimatingthereproducibilityofbiomarkersets

Analyzing biomarker discovery: Estimating the reproducibility of biomarker sets

Ejemplares similares