Cargando…

Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch

INTRODUCTION: Lack of participation in clinical trials (CTs) is a major barrier for the evaluation of new pharmaceuticals and devices. Here we report the results of the analysis of a dataset from ResearchMatch, an online clinical registry, using supervised machine learning approaches and a deep lear...

Descripción completa

Detalles Bibliográficos
Autores principales:	Vazquez, Janette, Abdelrahman, Samir, Byrne, Loretta M., Russell, Michael, Harris, Paul, Facelli, Julio C.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cambridge University Press 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057403/ https://www.ncbi.nlm.nih.gov/pubmed/33948264 http://dx.doi.org/10.1017/cts.2020.535

_version_	1783680828060991488
author	Vazquez, Janette Abdelrahman, Samir Byrne, Loretta M. Russell, Michael Harris, Paul Facelli, Julio C.
author_facet	Vazquez, Janette Abdelrahman, Samir Byrne, Loretta M. Russell, Michael Harris, Paul Facelli, Julio C.
author_sort	Vazquez, Janette
collection	PubMed
description	INTRODUCTION: Lack of participation in clinical trials (CTs) is a major barrier for the evaluation of new pharmaceuticals and devices. Here we report the results of the analysis of a dataset from ResearchMatch, an online clinical registry, using supervised machine learning approaches and a deep learning approach to discover characteristics of individuals more likely to show an interest in participating in CTs. METHODS: We trained six supervised machine learning classifiers (Logistic Regression (LR), Decision Tree (DT), Gaussian Naïve Bayes (GNB), K-Nearest Neighbor Classifier (KNC), Adaboost Classifier (ABC) and a Random Forest Classifier (RFC)), as well as a deep learning method, Convolutional Neural Network (CNN), using a dataset of 841,377 instances and 20 features, including demographic data, geographic constraints, medical conditions and ResearchMatch visit history. Our outcome variable consisted of responses showing specific participant interest when presented with specific clinical trial opportunity invitations (‘yes’ or ‘no’). Furthermore, we created four subsets from this dataset based on top self-reported medical conditions and gender, which were separately analysed. RESULTS: The deep learning model outperformed the machine learning classifiers, achieving an area under the curve (AUC) of 0.8105. CONCLUSIONS: The results show sufficient evidence that there are meaningful correlations amongst predictor variables and outcome variable in the datasets analysed using the supervised machine learning classifiers. These approaches show promise in identifying individuals who may be more likely to participate when offered an opportunity for a clinical trial.
format	Online Article Text
id	pubmed-8057403
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Cambridge University Press
record_format	MEDLINE/PubMed
spelling	pubmed-80574032021-05-03 Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch Vazquez, Janette Abdelrahman, Samir Byrne, Loretta M. Russell, Michael Harris, Paul Facelli, Julio C. J Clin Transl Sci Research Article INTRODUCTION: Lack of participation in clinical trials (CTs) is a major barrier for the evaluation of new pharmaceuticals and devices. Here we report the results of the analysis of a dataset from ResearchMatch, an online clinical registry, using supervised machine learning approaches and a deep learning approach to discover characteristics of individuals more likely to show an interest in participating in CTs. METHODS: We trained six supervised machine learning classifiers (Logistic Regression (LR), Decision Tree (DT), Gaussian Naïve Bayes (GNB), K-Nearest Neighbor Classifier (KNC), Adaboost Classifier (ABC) and a Random Forest Classifier (RFC)), as well as a deep learning method, Convolutional Neural Network (CNN), using a dataset of 841,377 instances and 20 features, including demographic data, geographic constraints, medical conditions and ResearchMatch visit history. Our outcome variable consisted of responses showing specific participant interest when presented with specific clinical trial opportunity invitations (‘yes’ or ‘no’). Furthermore, we created four subsets from this dataset based on top self-reported medical conditions and gender, which were separately analysed. RESULTS: The deep learning model outperformed the machine learning classifiers, achieving an area under the curve (AUC) of 0.8105. CONCLUSIONS: The results show sufficient evidence that there are meaningful correlations amongst predictor variables and outcome variable in the datasets analysed using the supervised machine learning classifiers. These approaches show promise in identifying individuals who may be more likely to participate when offered an opportunity for a clinical trial. Cambridge University Press 2020-09-04 /pmc/articles/PMC8057403/ /pubmed/33948264 http://dx.doi.org/10.1017/cts.2020.535 Text en © The Association for Clinical and Translational Science 2020 https://creativecommons.org/licenses/by/4.0/This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Vazquez, Janette Abdelrahman, Samir Byrne, Loretta M. Russell, Michael Harris, Paul Facelli, Julio C. Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch
title	Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch
title_full	Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch
title_fullStr	Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch
title_full_unstemmed	Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch
title_short	Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch
title_sort	using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of researchmatch
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057403/ https://www.ncbi.nlm.nih.gov/pubmed/33948264 http://dx.doi.org/10.1017/cts.2020.535
work_keys_str_mv	AT vazquezjanette usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch AT abdelrahmansamir usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch AT byrnelorettam usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch AT russellmichael usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch AT harrispaul usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch AT facellijulioc usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch

Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch

Ejemplares similares