Cargando…

Hierarchical confounder discovery in the experiment-machine learning cycle

The promise of machine learning (ML) to extract insights from high-dimensional datasets is tempered by confounding variables. It behooves scientists to determine if a model has extracted the desired information or instead fallen prey to bias. Due to features of natural phenomena and experimental des...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rogozhnikov, Alex, Ramkumar, Pavan, Bedi, Rishi, Kato, Saul, Escola, G. Sean
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9024009/ https://www.ncbi.nlm.nih.gov/pubmed/35465234 http://dx.doi.org/10.1016/j.patter.2022.100451

_version_	1784690470336593920
author	Rogozhnikov, Alex Ramkumar, Pavan Bedi, Rishi Kato, Saul Escola, G. Sean
author_facet	Rogozhnikov, Alex Ramkumar, Pavan Bedi, Rishi Kato, Saul Escola, G. Sean
author_sort	Rogozhnikov, Alex
collection	PubMed
description	The promise of machine learning (ML) to extract insights from high-dimensional datasets is tempered by confounding variables. It behooves scientists to determine if a model has extracted the desired information or instead fallen prey to bias. Due to features of natural phenomena and experimental design constraints, bioscience datasets are often organized in nested hierarchies that obfuscate the origins of confounding effects and render confounder amelioration methods ineffective. We propose a non-parametric statistical method called the rank-to-group (RTG) score that identifies hierarchical confounder effects in raw data and ML-derived embeddings. We show that RTG scores correctly assign the effects of hierarchical confounders when linear methods fail. In a public biomedical image dataset, we discover unreported effects of experimental design. We then use RTG scores to discover crossmodal correlated variability in a multi-phenotypic biological dataset. This approach should be generally useful in experiment-analysis cycles and to ensure confounder robustness in ML models.
format	Online Article Text
id	pubmed-9024009
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-90240092022-04-23 Hierarchical confounder discovery in the experiment-machine learning cycle Rogozhnikov, Alex Ramkumar, Pavan Bedi, Rishi Kato, Saul Escola, G. Sean Patterns (N Y) Article The promise of machine learning (ML) to extract insights from high-dimensional datasets is tempered by confounding variables. It behooves scientists to determine if a model has extracted the desired information or instead fallen prey to bias. Due to features of natural phenomena and experimental design constraints, bioscience datasets are often organized in nested hierarchies that obfuscate the origins of confounding effects and render confounder amelioration methods ineffective. We propose a non-parametric statistical method called the rank-to-group (RTG) score that identifies hierarchical confounder effects in raw data and ML-derived embeddings. We show that RTG scores correctly assign the effects of hierarchical confounders when linear methods fail. In a public biomedical image dataset, we discover unreported effects of experimental design. We then use RTG scores to discover crossmodal correlated variability in a multi-phenotypic biological dataset. This approach should be generally useful in experiment-analysis cycles and to ensure confounder robustness in ML models. Elsevier 2022-02-22 /pmc/articles/PMC9024009/ /pubmed/35465234 http://dx.doi.org/10.1016/j.patter.2022.100451 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Article Rogozhnikov, Alex Ramkumar, Pavan Bedi, Rishi Kato, Saul Escola, G. Sean Hierarchical confounder discovery in the experiment-machine learning cycle
title	Hierarchical confounder discovery in the experiment-machine learning cycle
title_full	Hierarchical confounder discovery in the experiment-machine learning cycle
title_fullStr	Hierarchical confounder discovery in the experiment-machine learning cycle
title_full_unstemmed	Hierarchical confounder discovery in the experiment-machine learning cycle
title_short	Hierarchical confounder discovery in the experiment-machine learning cycle
title_sort	hierarchical confounder discovery in the experiment-machine learning cycle
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9024009/ https://www.ncbi.nlm.nih.gov/pubmed/35465234 http://dx.doi.org/10.1016/j.patter.2022.100451
work_keys_str_mv	AT rogozhnikovalex hierarchicalconfounderdiscoveryintheexperimentmachinelearningcycle AT ramkumarpavan hierarchicalconfounderdiscoveryintheexperimentmachinelearningcycle AT bedirishi hierarchicalconfounderdiscoveryintheexperimentmachinelearningcycle AT katosaul hierarchicalconfounderdiscoveryintheexperimentmachinelearningcycle AT escolagsean hierarchicalconfounderdiscoveryintheexperimentmachinelearningcycle

Hierarchical confounder discovery in the experiment-machine learning cycle

Ejemplares similares