Cargando…
Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch
INTRODUCTION: Lack of participation in clinical trials (CTs) is a major barrier for the evaluation of new pharmaceuticals and devices. Here we report the results of the analysis of a dataset from ResearchMatch, an online clinical registry, using supervised machine learning approaches and a deep lear...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cambridge University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057403/ https://www.ncbi.nlm.nih.gov/pubmed/33948264 http://dx.doi.org/10.1017/cts.2020.535 |
_version_ | 1783680828060991488 |
---|---|
author | Vazquez, Janette Abdelrahman, Samir Byrne, Loretta M. Russell, Michael Harris, Paul Facelli, Julio C. |
author_facet | Vazquez, Janette Abdelrahman, Samir Byrne, Loretta M. Russell, Michael Harris, Paul Facelli, Julio C. |
author_sort | Vazquez, Janette |
collection | PubMed |
description | INTRODUCTION: Lack of participation in clinical trials (CTs) is a major barrier for the evaluation of new pharmaceuticals and devices. Here we report the results of the analysis of a dataset from ResearchMatch, an online clinical registry, using supervised machine learning approaches and a deep learning approach to discover characteristics of individuals more likely to show an interest in participating in CTs. METHODS: We trained six supervised machine learning classifiers (Logistic Regression (LR), Decision Tree (DT), Gaussian Naïve Bayes (GNB), K-Nearest Neighbor Classifier (KNC), Adaboost Classifier (ABC) and a Random Forest Classifier (RFC)), as well as a deep learning method, Convolutional Neural Network (CNN), using a dataset of 841,377 instances and 20 features, including demographic data, geographic constraints, medical conditions and ResearchMatch visit history. Our outcome variable consisted of responses showing specific participant interest when presented with specific clinical trial opportunity invitations (‘yes’ or ‘no’). Furthermore, we created four subsets from this dataset based on top self-reported medical conditions and gender, which were separately analysed. RESULTS: The deep learning model outperformed the machine learning classifiers, achieving an area under the curve (AUC) of 0.8105. CONCLUSIONS: The results show sufficient evidence that there are meaningful correlations amongst predictor variables and outcome variable in the datasets analysed using the supervised machine learning classifiers. These approaches show promise in identifying individuals who may be more likely to participate when offered an opportunity for a clinical trial. |
format | Online Article Text |
id | pubmed-8057403 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Cambridge University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-80574032021-05-03 Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch Vazquez, Janette Abdelrahman, Samir Byrne, Loretta M. Russell, Michael Harris, Paul Facelli, Julio C. J Clin Transl Sci Research Article INTRODUCTION: Lack of participation in clinical trials (CTs) is a major barrier for the evaluation of new pharmaceuticals and devices. Here we report the results of the analysis of a dataset from ResearchMatch, an online clinical registry, using supervised machine learning approaches and a deep learning approach to discover characteristics of individuals more likely to show an interest in participating in CTs. METHODS: We trained six supervised machine learning classifiers (Logistic Regression (LR), Decision Tree (DT), Gaussian Naïve Bayes (GNB), K-Nearest Neighbor Classifier (KNC), Adaboost Classifier (ABC) and a Random Forest Classifier (RFC)), as well as a deep learning method, Convolutional Neural Network (CNN), using a dataset of 841,377 instances and 20 features, including demographic data, geographic constraints, medical conditions and ResearchMatch visit history. Our outcome variable consisted of responses showing specific participant interest when presented with specific clinical trial opportunity invitations (‘yes’ or ‘no’). Furthermore, we created four subsets from this dataset based on top self-reported medical conditions and gender, which were separately analysed. RESULTS: The deep learning model outperformed the machine learning classifiers, achieving an area under the curve (AUC) of 0.8105. CONCLUSIONS: The results show sufficient evidence that there are meaningful correlations amongst predictor variables and outcome variable in the datasets analysed using the supervised machine learning classifiers. These approaches show promise in identifying individuals who may be more likely to participate when offered an opportunity for a clinical trial. Cambridge University Press 2020-09-04 /pmc/articles/PMC8057403/ /pubmed/33948264 http://dx.doi.org/10.1017/cts.2020.535 Text en © The Association for Clinical and Translational Science 2020 https://creativecommons.org/licenses/by/4.0/This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Vazquez, Janette Abdelrahman, Samir Byrne, Loretta M. Russell, Michael Harris, Paul Facelli, Julio C. Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch |
title | Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch |
title_full | Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch |
title_fullStr | Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch |
title_full_unstemmed | Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch |
title_short | Using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of ResearchMatch |
title_sort | using supervised machine learning classifiers to estimate likelihood of participating in clinical trials of a de-identified version of researchmatch |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8057403/ https://www.ncbi.nlm.nih.gov/pubmed/33948264 http://dx.doi.org/10.1017/cts.2020.535 |
work_keys_str_mv | AT vazquezjanette usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch AT abdelrahmansamir usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch AT byrnelorettam usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch AT russellmichael usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch AT harrispaul usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch AT facellijulioc usingsupervisedmachinelearningclassifierstoestimatelikelihoodofparticipatinginclinicaltrialsofadeidentifiedversionofresearchmatch |