Cargando…
Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
BACKGROUND: Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small po...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10228061/ https://www.ncbi.nlm.nih.gov/pubmed/37248476 http://dx.doi.org/10.1186/s13326-023-00287-7 |
_version_ | 1785050893185449984 |
---|---|
author | Xie, Weixin Fan, Kunjie Zhang, Shijun Li, Lang |
author_facet | Xie, Weixin Fan, Kunjie Zhang, Shijun Li, Lang |
author_sort | Xie, Weixin |
collection | PubMed |
description | BACKGROUND: Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper. RESULTS: PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively. CONCLUSIONS: By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13326-023-00287-7. |
format | Online Article Text |
id | pubmed-10228061 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-102280612023-05-31 Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature Xie, Weixin Fan, Kunjie Zhang, Shijun Li, Lang J Biomed Semantics Research BACKGROUND: Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper. RESULTS: PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively. CONCLUSIONS: By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13326-023-00287-7. BioMed Central 2023-05-30 /pmc/articles/PMC10228061/ /pubmed/37248476 http://dx.doi.org/10.1186/s13326-023-00287-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Xie, Weixin Fan, Kunjie Zhang, Shijun Li, Lang Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature |
title | Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature |
title_full | Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature |
title_fullStr | Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature |
title_full_unstemmed | Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature |
title_short | Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature |
title_sort | multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10228061/ https://www.ncbi.nlm.nih.gov/pubmed/37248476 http://dx.doi.org/10.1186/s13326-023-00287-7 |
work_keys_str_mv | AT xieweixin multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature AT fankunjie multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature AT zhangshijun multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature AT lilang multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature |