Cargando…

Quantifying representativeness in randomized clinical trials using machine learning fairness metrics

OBJECTIVE: We help identify subpopulations underrepresented in randomized clinical trials (RCTs) cohorts with respect to national, community-based or health system target populations by formulating population representativeness of RCTs as a machine learning (ML) fairness problem, deriving new repres...

Descripción completa

Detalles Bibliográficos
Autores principales:	Qi, Miao, Cahan, Owen, Foreman, Morgan A, Gruen, Daniel M, Das, Amar K, Bennett, Kristin P
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8460438/ https://www.ncbi.nlm.nih.gov/pubmed/34568771 http://dx.doi.org/10.1093/jamiaopen/ooab077

_version_	1784571751840088064
author	Qi, Miao Cahan, Owen Foreman, Morgan A Gruen, Daniel M Das, Amar K Bennett, Kristin P
author_facet	Qi, Miao Cahan, Owen Foreman, Morgan A Gruen, Daniel M Das, Amar K Bennett, Kristin P
author_sort	Qi, Miao
collection	PubMed
description	OBJECTIVE: We help identify subpopulations underrepresented in randomized clinical trials (RCTs) cohorts with respect to national, community-based or health system target populations by formulating population representativeness of RCTs as a machine learning (ML) fairness problem, deriving new representation metrics, and deploying them in easy-to-understand interactive visualization tools. MATERIALS AND METHODS: We represent RCT cohort enrollment as random binary classification fairness problems, and then show how ML fairness metrics based on enrollment fraction can be efficiently calculated using easily computed rates of subpopulations in RCT cohorts and target populations. We propose standardized versions of these metrics and deploy them in an interactive tool to analyze 3 RCTs with respect to type 2 diabetes and hypertension target populations in the National Health and Nutrition Examination Survey. RESULTS: We demonstrate how the proposed metrics and associated statistics enable users to rapidly examine representativeness of all subpopulations in the RCT defined by a set of categorical traits (eg, gender, race, ethnicity, smoking status, and blood pressure) with respect to target populations. DISCUSSION: The normalized metrics provide an intuitive standardized scale for evaluating representation across subgroups, which may have vastly different enrollment fractions and rates in RCT study cohorts. The metrics are beneficial complements to other approaches (eg, enrollment fractions) used to identify generalizability and health equity of RCTs. CONCLUSION: By quantifying the gaps between RCT and target populations, the proposed methods can support generalizability evaluation of existing RCT cohorts. The interactive visualization tool can be readily applied to identified underrepresented subgroups with respect to any desired source or target populations.
format	Online Article Text
id	pubmed-8460438
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-84604382021-09-24 Quantifying representativeness in randomized clinical trials using machine learning fairness metrics Qi, Miao Cahan, Owen Foreman, Morgan A Gruen, Daniel M Das, Amar K Bennett, Kristin P JAMIA Open Research and Applications OBJECTIVE: We help identify subpopulations underrepresented in randomized clinical trials (RCTs) cohorts with respect to national, community-based or health system target populations by formulating population representativeness of RCTs as a machine learning (ML) fairness problem, deriving new representation metrics, and deploying them in easy-to-understand interactive visualization tools. MATERIALS AND METHODS: We represent RCT cohort enrollment as random binary classification fairness problems, and then show how ML fairness metrics based on enrollment fraction can be efficiently calculated using easily computed rates of subpopulations in RCT cohorts and target populations. We propose standardized versions of these metrics and deploy them in an interactive tool to analyze 3 RCTs with respect to type 2 diabetes and hypertension target populations in the National Health and Nutrition Examination Survey. RESULTS: We demonstrate how the proposed metrics and associated statistics enable users to rapidly examine representativeness of all subpopulations in the RCT defined by a set of categorical traits (eg, gender, race, ethnicity, smoking status, and blood pressure) with respect to target populations. DISCUSSION: The normalized metrics provide an intuitive standardized scale for evaluating representation across subgroups, which may have vastly different enrollment fractions and rates in RCT study cohorts. The metrics are beneficial complements to other approaches (eg, enrollment fractions) used to identify generalizability and health equity of RCTs. CONCLUSION: By quantifying the gaps between RCT and target populations, the proposed methods can support generalizability evaluation of existing RCT cohorts. The interactive visualization tool can be readily applied to identified underrepresented subgroups with respect to any desired source or target populations. Oxford University Press 2021-09-24 /pmc/articles/PMC8460438/ /pubmed/34568771 http://dx.doi.org/10.1093/jamiaopen/ooab077 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Research and Applications Qi, Miao Cahan, Owen Foreman, Morgan A Gruen, Daniel M Das, Amar K Bennett, Kristin P Quantifying representativeness in randomized clinical trials using machine learning fairness metrics
title	Quantifying representativeness in randomized clinical trials using machine learning fairness metrics
title_full	Quantifying representativeness in randomized clinical trials using machine learning fairness metrics
title_fullStr	Quantifying representativeness in randomized clinical trials using machine learning fairness metrics
title_full_unstemmed	Quantifying representativeness in randomized clinical trials using machine learning fairness metrics
title_short	Quantifying representativeness in randomized clinical trials using machine learning fairness metrics
title_sort	quantifying representativeness in randomized clinical trials using machine learning fairness metrics
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8460438/ https://www.ncbi.nlm.nih.gov/pubmed/34568771 http://dx.doi.org/10.1093/jamiaopen/ooab077
work_keys_str_mv	AT qimiao quantifyingrepresentativenessinrandomizedclinicaltrialsusingmachinelearningfairnessmetrics AT cahanowen quantifyingrepresentativenessinrandomizedclinicaltrialsusingmachinelearningfairnessmetrics AT foremanmorgana quantifyingrepresentativenessinrandomizedclinicaltrialsusingmachinelearningfairnessmetrics AT gruendanielm quantifyingrepresentativenessinrandomizedclinicaltrialsusingmachinelearningfairnessmetrics AT dasamark quantifyingrepresentativenessinrandomizedclinicaltrialsusingmachinelearningfairnessmetrics AT bennettkristinp quantifyingrepresentativenessinrandomizedclinicaltrialsusingmachinelearningfairnessmetrics

Quantifying representativeness in randomized clinical trials using machine learning fairness metrics

Ejemplares similares