Cargando…

Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch

INTRODUCTION: Lack of participation in clinical trials (CTs) is a major barrier for the evaluation of new pharmaceuticals and devices. Here we report the results of the analysis of a dataset from ResearchMatch, an online clinical registry, using supervised machine learning approaches and a deep lear...

Descripción completa

Detalles Bibliográficos
Autores principales: Vazquez, Janette, Abdelrahman, Samir, Byrne, Loretta M., Russell, Michael, Harris, Paul, Facelli, Julio C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cambridge University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057403/
https://www.ncbi.nlm.nih.gov/pubmed/33948264
http://dx.doi.org/10.1017/cts.2020.535
_version_ 1783680828060991488
author Vazquez, Janette
Abdelrahman, Samir
Byrne, Loretta M.
Russell, Michael
Harris, Paul
Facelli, Julio C.
author_facet Vazquez, Janette
Abdelrahman, Samir
Byrne, Loretta M.
Russell, Michael
Harris, Paul
Facelli, Julio C.
author_sort Vazquez, Janette
collection PubMed
description INTRODUCTION: Lack of participation in clinical trials (CTs) is a major barrier for the evaluation of new pharmaceuticals and devices. Here we report the results of the analysis of a dataset from ResearchMatch, an online clinical registry, using supervised machine learning approaches and a deep learning approach to discover characteristics of individuals more likely to show an interest in participating in CTs. METHODS: We trained six supervised machine learning classifiers (Logistic Regression (LR), Decision Tree (DT), Gaussian Naïve Bayes (GNB), K-Nearest Neighbor Classifier (KNC), Adaboost Classifier (ABC) and a Random Forest Classifier (RFC)), as well as a deep learning method, Convolutional Neural Network (CNN), using a dataset of 841,377 instances and 20 features, including demographic data, geographic constraints, medical conditions and ResearchMatch visit history. Our outcome variable consisted of responses showing specific participant interest when presented with specific clinical trial opportunity invitations (‘yes’ or ‘no’). Furthermore, we created four subsets from this dataset based on top self-reported medical conditions and gender, which were separately analysed. RESULTS: The deep learning model outperformed the machine learning classifiers, achieving an area under the curve (AUC) of 0.8105. CONCLUSIONS: The results show sufficient evidence that there are meaningful correlations amongst predictor variables and outcome variable in the datasets analysed using the supervised machine learning classifiers. These approaches show promise in identifying individuals who may be more likely to participate when offered an opportunity for a clinical trial.
format Online
Article
Text
id pubmed-8057403
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Cambridge University Press
record_format MEDLINE/PubMed
spelling pubmed-80574032021-05-03 Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch Vazquez, Janette Abdelrahman, Samir Byrne, Loretta M. Russell, Michael Harris, Paul Facelli, Julio C. J Clin Transl Sci Research Article INTRODUCTION: Lack of participation in clinical trials (CTs) is a major barrier for the evaluation of new pharmaceuticals and devices. Here we report the results of the analysis of a dataset from ResearchMatch, an online clinical registry, using supervised machine learning approaches and a deep learning approach to discover characteristics of individuals more likely to show an interest in participating in CTs. METHODS: We trained six supervised machine learning classifiers (Logistic Regression (LR), Decision Tree (DT), Gaussian Naïve Bayes (GNB), K-Nearest Neighbor Classifier (KNC), Adaboost Classifier (ABC) and a Random Forest Classifier (RFC)), as well as a deep learning method, Convolutional Neural Network (CNN), using a dataset of 841,377 instances and 20 features, including demographic data, geographic constraints, medical conditions and ResearchMatch visit history. Our outcome variable consisted of responses showing specific participant interest when presented with specific clinical trial opportunity invitations (‘yes’ or ‘no’). Furthermore, we created four subsets from this dataset based on top self-reported medical conditions and gender, which were separately analysed. RESULTS: The deep learning model outperformed the machine learning classifiers, achieving an area under the curve (AUC) of 0.8105. CONCLUSIONS: The results show sufficient evidence that there are meaningful correlations amongst predictor variables and outcome variable in the datasets analysed using the supervised machine learning classifiers. These approaches show promise in identifying individuals who may be more likely to participate when offered an opportunity for a clinical trial. Cambridge University Press 2020-09-04 /pmc/articles/PMC8057403/ /pubmed/33948264 http://dx.doi.org/10.1017/cts.2020.535 Text en © The Association for Clinical and Translational Science 2020 https://creativecommons.org/licenses/by/4.0/This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Vazquez, Janette
Abdelrahman, Samir
Byrne, Loretta M.
Russell, Michael
Harris, Paul
Facelli, Julio C.
Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch
title Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch
title_full Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch
title_fullStr Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch
title_full_unstemmed Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch
title_short Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch
title_sort using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of researchmatch
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057403/
https://www.ncbi.nlm.nih.gov/pubmed/33948264
http://dx.doi.org/10.1017/cts.2020.535
work_keys_str_mv AT vazquezjanette usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch
AT abdelrahmansamir usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch
AT byrnelorettam usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch
AT russellmichael usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch
AT harrispaul usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch
AT facellijulioc usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch