Cargando…
Latent Dirichlet Allocation in predicting clinical trial terminations
BACKGROUND: This study used natural language processing (NLP) and machine learning (ML) techniques to identify reliable patterns from within research narrative documents to distinguish studies that complete successfully, from the ones that terminate. Recent research findings have reported that at le...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6882341/ https://www.ncbi.nlm.nih.gov/pubmed/31775737 http://dx.doi.org/10.1186/s12911-019-0973-y |
_version_ | 1783474137080004608 |
---|---|
author | Geletta, Simon Follett, Lendie Laugerman, Marcia |
author_facet | Geletta, Simon Follett, Lendie Laugerman, Marcia |
author_sort | Geletta, Simon |
collection | PubMed |
description | BACKGROUND: This study used natural language processing (NLP) and machine learning (ML) techniques to identify reliable patterns from within research narrative documents to distinguish studies that complete successfully, from the ones that terminate. Recent research findings have reported that at least 10 % of all studies that are funded by major research funding agencies terminate without yielding useful results. Since it is well-known that scientific studies that receive funding from major funding agencies are carefully planned, and rigorously vetted through the peer-review process, it was somewhat daunting to us that study-terminations are this prevalent. Moreover, our review of the literature about study terminations suggested that the reasons for study terminations are not well understood. We therefore aimed to address that knowledge gap, by seeking to identify the factors that contribute to study failures. METHOD: We used data from the clinicialTrials.gov repository, from which we extracted both structured data (study characteristics), and unstructured data (the narrative description of the studies). We applied natural language processing techniques to the unstructured data to quantify the risk of termination by identifying distinctive topics that are more frequently associated with trials that are terminated and trials that are completed. We used the Latent Dirichlet Allocation (LDA) technique to derive 25 “topics” with corresponding sets of probabilities, which we then used to predict study-termination by utilizing random forest modeling. We fit two distinct models – one using only structured data as predictors and another model with both structured data and the 25 text topics derived from the unstructured data. RESULTS: In this paper, we demonstrate the interpretive and predictive value of LDA as it relates to predicting clinical trial failure. The results also demonstrate that the combined modeling approach yields robust predictive probabilities in terms of both sensitivity and specificity, relative to a model that utilizes the structured data alone. CONCLUSIONS: Our study demonstrated that the use of topic modeling using LDA significantly raises the utility of unstructured data in better predicating the completion vs. termination of studies. This study sets the direction for future research to evaluate the viability of the designs of health studies. |
format | Online Article Text |
id | pubmed-6882341 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-68823412019-12-03 Latent Dirichlet Allocation in predicting clinical trial terminations Geletta, Simon Follett, Lendie Laugerman, Marcia BMC Med Inform Decis Mak Research Article BACKGROUND: This study used natural language processing (NLP) and machine learning (ML) techniques to identify reliable patterns from within research narrative documents to distinguish studies that complete successfully, from the ones that terminate. Recent research findings have reported that at least 10 % of all studies that are funded by major research funding agencies terminate without yielding useful results. Since it is well-known that scientific studies that receive funding from major funding agencies are carefully planned, and rigorously vetted through the peer-review process, it was somewhat daunting to us that study-terminations are this prevalent. Moreover, our review of the literature about study terminations suggested that the reasons for study terminations are not well understood. We therefore aimed to address that knowledge gap, by seeking to identify the factors that contribute to study failures. METHOD: We used data from the clinicialTrials.gov repository, from which we extracted both structured data (study characteristics), and unstructured data (the narrative description of the studies). We applied natural language processing techniques to the unstructured data to quantify the risk of termination by identifying distinctive topics that are more frequently associated with trials that are terminated and trials that are completed. We used the Latent Dirichlet Allocation (LDA) technique to derive 25 “topics” with corresponding sets of probabilities, which we then used to predict study-termination by utilizing random forest modeling. We fit two distinct models – one using only structured data as predictors and another model with both structured data and the 25 text topics derived from the unstructured data. RESULTS: In this paper, we demonstrate the interpretive and predictive value of LDA as it relates to predicting clinical trial failure. The results also demonstrate that the combined modeling approach yields robust predictive probabilities in terms of both sensitivity and specificity, relative to a model that utilizes the structured data alone. CONCLUSIONS: Our study demonstrated that the use of topic modeling using LDA significantly raises the utility of unstructured data in better predicating the completion vs. termination of studies. This study sets the direction for future research to evaluate the viability of the designs of health studies. BioMed Central 2019-11-27 /pmc/articles/PMC6882341/ /pubmed/31775737 http://dx.doi.org/10.1186/s12911-019-0973-y Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Geletta, Simon Follett, Lendie Laugerman, Marcia Latent Dirichlet Allocation in predicting clinical trial terminations |
title | Latent Dirichlet Allocation in predicting clinical trial terminations |
title_full | Latent Dirichlet Allocation in predicting clinical trial terminations |
title_fullStr | Latent Dirichlet Allocation in predicting clinical trial terminations |
title_full_unstemmed | Latent Dirichlet Allocation in predicting clinical trial terminations |
title_short | Latent Dirichlet Allocation in predicting clinical trial terminations |
title_sort | latent dirichlet allocation in predicting clinical trial terminations |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6882341/ https://www.ncbi.nlm.nih.gov/pubmed/31775737 http://dx.doi.org/10.1186/s12911-019-0973-y |
work_keys_str_mv | AT gelettasimon latentdirichletallocationinpredictingclinicaltrialterminations AT follettlendie latentdirichletallocationinpredictingclinicaltrialterminations AT laugermanmarcia latentdirichletallocationinpredictingclinicaltrialterminations |