Cargando…

Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature

BACKGROUND: Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small po...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xie, Weixin, Fan, Kunjie, Zhang, Shijun, Li, Lang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10228061/ https://www.ncbi.nlm.nih.gov/pubmed/37248476 http://dx.doi.org/10.1186/s13326-023-00287-7

_version_	1785050893185449984
author	Xie, Weixin Fan, Kunjie Zhang, Shijun Li, Lang
author_facet	Xie, Weixin Fan, Kunjie Zhang, Shijun Li, Lang
author_sort	Xie, Weixin
collection	PubMed
description	BACKGROUND: Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper. RESULTS: PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively. CONCLUSIONS: By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13326-023-00287-7.
format	Online Article Text
id	pubmed-10228061
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-102280612023-05-31 Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature Xie, Weixin Fan, Kunjie Zhang, Shijun Li, Lang J Biomed Semantics Research BACKGROUND: Drug-drug interaction (DDI) information retrieval (IR) is an important natural language process (NLP) task from the PubMed literature. For the first time, active learning (AL) is studied in DDI IR analysis. DDI IR analysis from PubMed abstracts faces the challenges of relatively small positive DDI samples among overwhelmingly large negative samples. Random negative sampling and positive sampling are purposely designed to improve the efficiency of AL analysis. The consistency of random negative sampling and positive sampling is shown in the paper. RESULTS: PubMed abstracts are divided into two pools. Screened pool contains all abstracts that pass the DDI keywords query in PubMed, while unscreened pool includes all the other abstracts. At a prespecified recall rate of 0.95, DDI IR analysis precision is evaluated and compared. In screened pool IR analysis using supporting vector machine (SVM), similarity sampling plus uncertainty sampling improves the precision over uncertainty sampling, from 0.89 to 0.92 respectively. In the unscreened pool IR analysis, the integrated random negative sampling, positive sampling, and similarity sampling improve the precision over uncertainty sampling along, from 0.72 to 0.81 respectively. When we change the SVM to a deep learning method, all sampling schemes consistently improve DDI AL analysis in both screened pool and unscreened pool. Deep learning has significant improvement of precision over SVM, 0.96 vs. 0.92 in screened pool, and 0.90 vs. 0.81 in the unscreened pool, respectively. CONCLUSIONS: By integrating various sampling schemes and deep learning algorithms into AL, the DDI IR analysis from literature is significantly improved. The random negative sampling and positive sampling are highly effective methods in improving AL analysis where the positive and negative samples are extremely imbalanced. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13326-023-00287-7. BioMed Central 2023-05-30 /pmc/articles/PMC10228061/ /pubmed/37248476 http://dx.doi.org/10.1186/s13326-023-00287-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Xie, Weixin Fan, Kunjie Zhang, Shijun Li, Lang Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title	Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_full	Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_fullStr	Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_full_unstemmed	Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_short	Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
title_sort	multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10228061/ https://www.ncbi.nlm.nih.gov/pubmed/37248476 http://dx.doi.org/10.1186/s13326-023-00287-7
work_keys_str_mv	AT xieweixin multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature AT fankunjie multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature AT zhangshijun multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature AT lilang multiplesamplingschemesanddeeplearningimproveactivelearningperformanceindrugdruginteractioninformationretrievalanalysisfromtheliterature

Multiple sampling schemes and deep learning improve active learning performance in drug-drug interaction information retrieval analysis from the literature

Ejemplares similares