Cargando…

Integrated Random Negative Sampling and Uncertainty Sampling in Active Learning Improve Clinical Drug Safety Drug–Drug Interaction Information Retrieval

Clinical drug–drug interactions (DDIs) have been a major cause for not only medical error but also adverse drug events (ADEs). The published literature on DDI clinical toxicity continues to grow significantly, and high-performance DDI information retrieval (IR) text mining methods are in high demand...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xie, Weixin, Wang, Limei, Cheng, Qi, Wang, Xueying, Wang, Ying, Bi, Hongyuan, He, Bo, Feng, Weixing
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Pharmacology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8130007/ https://www.ncbi.nlm.nih.gov/pubmed/34017245 http://dx.doi.org/10.3389/fphar.2020.582470

_version_	1783694425124241408
author	Xie, Weixin Wang, Limei Cheng, Qi Wang, Xueying Wang, Ying Bi, Hongyuan He, Bo Feng, Weixing
author_facet	Xie, Weixin Wang, Limei Cheng, Qi Wang, Xueying Wang, Ying Bi, Hongyuan He, Bo Feng, Weixing
author_sort	Xie, Weixin
collection	PubMed
description	Clinical drug–drug interactions (DDIs) have been a major cause for not only medical error but also adverse drug events (ADEs). The published literature on DDI clinical toxicity continues to grow significantly, and high-performance DDI information retrieval (IR) text mining methods are in high demand. The effectiveness of IR and its machine learning (ML) algorithm depends on the availability of a large amount of training and validation data that have been manually reviewed and annotated. In this study, we investigated how active learning (AL) might improve ML performance in clinical safety DDI IR analysis. We recognized that a direct application of AL would not address several primary challenges in DDI IR from the literature. For instance, the vast majority of abstracts in PubMed will be negative, existing positive and negative labeled samples do not represent the general sample distributions, and potentially biased samples may arise during uncertainty sampling in an AL algorithm. Therefore, we developed several novel sampling and ML schemes to improve AL performance in DDI IR analysis. In particular, random negative sampling was added as a part of AL since it has no expanse in the manual data label. We also used two ML algorithms in an AL process to differentiate random negative samples from manually labeled negative samples, and updated both the training and validation samples during the AL process to avoid or reduce biased sampling. Two supervised ML algorithms, support vector machine (SVM) and logistic regression (LR), were used to investigate the consistency of our proposed AL algorithm. Because the ultimate goal of clinical safety DDI IR is to retrieve all DDI toxicity–relevant abstracts, a recall rate of 0.99 was set in developing the AL methods. When we used our newly proposed AL method with SVM, the precision in differentiating the positive samples from manually labeled negative samples improved from 0.45 in the first round to 0.83 in the second round, and the precision in differentiating the positive samples from random negative samples improved from 0.70 to 0.82 in the first and second rounds, respectively. When our proposed AL method was used with LR, the improvements in precision followed a similar trend. However, the other AL algorithms tested did not show improved precision largely because of biased samples caused by the uncertainty sampling or differences between training and validation data sets.
format	Online Article Text
id	pubmed-8130007
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-81300072021-05-19 Integrated Random Negative Sampling and Uncertainty Sampling in Active Learning Improve Clinical Drug Safety Drug–Drug Interaction Information Retrieval Xie, Weixin Wang, Limei Cheng, Qi Wang, Xueying Wang, Ying Bi, Hongyuan He, Bo Feng, Weixing Front Pharmacol Pharmacology Clinical drug–drug interactions (DDIs) have been a major cause for not only medical error but also adverse drug events (ADEs). The published literature on DDI clinical toxicity continues to grow significantly, and high-performance DDI information retrieval (IR) text mining methods are in high demand. The effectiveness of IR and its machine learning (ML) algorithm depends on the availability of a large amount of training and validation data that have been manually reviewed and annotated. In this study, we investigated how active learning (AL) might improve ML performance in clinical safety DDI IR analysis. We recognized that a direct application of AL would not address several primary challenges in DDI IR from the literature. For instance, the vast majority of abstracts in PubMed will be negative, existing positive and negative labeled samples do not represent the general sample distributions, and potentially biased samples may arise during uncertainty sampling in an AL algorithm. Therefore, we developed several novel sampling and ML schemes to improve AL performance in DDI IR analysis. In particular, random negative sampling was added as a part of AL since it has no expanse in the manual data label. We also used two ML algorithms in an AL process to differentiate random negative samples from manually labeled negative samples, and updated both the training and validation samples during the AL process to avoid or reduce biased sampling. Two supervised ML algorithms, support vector machine (SVM) and logistic regression (LR), were used to investigate the consistency of our proposed AL algorithm. Because the ultimate goal of clinical safety DDI IR is to retrieve all DDI toxicity–relevant abstracts, a recall rate of 0.99 was set in developing the AL methods. When we used our newly proposed AL method with SVM, the precision in differentiating the positive samples from manually labeled negative samples improved from 0.45 in the first round to 0.83 in the second round, and the precision in differentiating the positive samples from random negative samples improved from 0.70 to 0.82 in the first and second rounds, respectively. When our proposed AL method was used with LR, the improvements in precision followed a similar trend. However, the other AL algorithms tested did not show improved precision largely because of biased samples caused by the uncertainty sampling or differences between training and validation data sets. Frontiers Media S.A. 2021-04-23 /pmc/articles/PMC8130007/ /pubmed/34017245 http://dx.doi.org/10.3389/fphar.2020.582470 Text en Copyright © 2021 Xie, Wang, Cheng, Wang, Wang, Bi, He and Feng. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Pharmacology Xie, Weixin Wang, Limei Cheng, Qi Wang, Xueying Wang, Ying Bi, Hongyuan He, Bo Feng, Weixing Integrated Random Negative Sampling and Uncertainty Sampling in Active Learning Improve Clinical Drug Safety Drug–Drug Interaction Information Retrieval
title	Integrated Random Negative Sampling and Uncertainty Sampling in Active Learning Improve Clinical Drug Safety Drug–Drug Interaction Information Retrieval
title_full	Integrated Random Negative Sampling and Uncertainty Sampling in Active Learning Improve Clinical Drug Safety Drug–Drug Interaction Information Retrieval
title_fullStr	Integrated Random Negative Sampling and Uncertainty Sampling in Active Learning Improve Clinical Drug Safety Drug–Drug Interaction Information Retrieval
title_full_unstemmed	Integrated Random Negative Sampling and Uncertainty Sampling in Active Learning Improve Clinical Drug Safety Drug–Drug Interaction Information Retrieval
title_short	Integrated Random Negative Sampling and Uncertainty Sampling in Active Learning Improve Clinical Drug Safety Drug–Drug Interaction Information Retrieval
title_sort	integrated random negative sampling and uncertainty sampling in active learning improve clinical drug safety drug–drug interaction information retrieval
topic	Pharmacology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8130007/ https://www.ncbi.nlm.nih.gov/pubmed/34017245 http://dx.doi.org/10.3389/fphar.2020.582470
work_keys_str_mv	AT xieweixin integratedrandomnegativesamplinganduncertaintysamplinginactivelearningimproveclinicaldrugsafetydrugdruginteractioninformationretrieval AT wanglimei integratedrandomnegativesamplinganduncertaintysamplinginactivelearningimproveclinicaldrugsafetydrugdruginteractioninformationretrieval AT chengqi integratedrandomnegativesamplinganduncertaintysamplinginactivelearningimproveclinicaldrugsafetydrugdruginteractioninformationretrieval AT wangxueying integratedrandomnegativesamplinganduncertaintysamplinginactivelearningimproveclinicaldrugsafetydrugdruginteractioninformationretrieval AT wangying integratedrandomnegativesamplinganduncertaintysamplinginactivelearningimproveclinicaldrugsafetydrugdruginteractioninformationretrieval AT bihongyuan integratedrandomnegativesamplinganduncertaintysamplinginactivelearningimproveclinicaldrugsafetydrugdruginteractioninformationretrieval AT hebo integratedrandomnegativesamplinganduncertaintysamplinginactivelearningimproveclinicaldrugsafetydrugdruginteractioninformationretrieval AT fengweixing integratedrandomnegativesamplinganduncertaintysamplinginactivelearningimproveclinicaldrugsafetydrugdruginteractioninformationretrieval

Integrated Random Negative Sampling and Uncertainty Sampling in Active Learning Improve Clinical Drug Safety Drug–Drug Interaction Information Retrieval

Ejemplares similares