Cargando…

Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature

BACKGROUND: Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small po...

Descripción completa

Detalles Bibliográficos
Autores principales: Xie, Weixin, Fan, Kunjie, Zhang, Shijun, Li, Lang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10228061/
https://www.ncbi.nlm.nih.gov/pubmed/37248476
http://dx.doi.org/10.1186/s13326-023-00287-7
_version_ 1785050893185449984
author Xie, Weixin
Fan, Kunjie
Zhang, Shijun
Li, Lang
author_facet Xie, Weixin
Fan, Kunjie
Zhang, Shijun
Li, Lang
author_sort Xie, Weixin
collection PubMed
description BACKGROUND: Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper. RESULTS: PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively. CONCLUSIONS: By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13326-023-00287-7.
format Online
Article
Text
id pubmed-10228061
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-102280612023-05-31 Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature Xie, Weixin Fan, Kunjie Zhang, Shijun Li, Lang J Biomed Semantics Research BACKGROUND: Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper. RESULTS: PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively. CONCLUSIONS: By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13326-023-00287-7. BioMed Central 2023-05-30 /pmc/articles/PMC10228061/ /pubmed/37248476 http://dx.doi.org/10.1186/s13326-023-00287-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Xie, Weixin
Fan, Kunjie
Zhang, Shijun
Li, Lang
Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_full Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_fullStr Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_full_unstemmed Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_short Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_sort multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10228061/
https://www.ncbi.nlm.nih.gov/pubmed/37248476
http://dx.doi.org/10.1186/s13326-023-00287-7
work_keys_str_mv AT xieweixin multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature
AT fankunjie multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature
AT zhangshijun multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature
AT lilang multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature