Cargando…

Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19

BACKGROUND: Biomedical relation extraction (RE) is of great importance for researchers to conduct systematic biomedical studies. It not only helps knowledge mining, such as knowledge graphs and novel knowledge discovery, but also promotes translational applications, such as clinical diagnosis, decis...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Zeyu, Fang, Meng, Wu, Rebecca, Zong, Hui, Huang, Honglian, Tong, Yuantao, Xie, Yujia, Cheng, Shiyang, Wei, Ziyi, Crabbe, M James C, Zhang, Xiaoyan, Wang, Ying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10551783/
https://www.ncbi.nlm.nih.gov/pubmed/37632414
http://dx.doi.org/10.2196/48115
_version_ 1785115843246424064
author Zhang, Zeyu
Fang, Meng
Wu, Rebecca
Zong, Hui
Huang, Honglian
Tong, Yuantao
Xie, Yujia
Cheng, Shiyang
Wei, Ziyi
Crabbe, M James C
Zhang, Xiaoyan
Wang, Ying
author_facet Zhang, Zeyu
Fang, Meng
Wu, Rebecca
Zong, Hui
Huang, Honglian
Tong, Yuantao
Xie, Yujia
Cheng, Shiyang
Wei, Ziyi
Crabbe, M James C
Zhang, Xiaoyan
Wang, Ying
author_sort Zhang, Zeyu
collection PubMed
description BACKGROUND: Biomedical relation extraction (RE) is of great importance for researchers to conduct systematic biomedical studies. It not only helps knowledge mining, such as knowledge graphs and novel knowledge discovery, but also promotes translational applications, such as clinical diagnosis, decision-making, and precision medicine. However, the relations between biomedical entities are complex and diverse, and comprehensive biomedical RE is not yet well established. OBJECTIVE: We aimed to investigate and improve large-scale RE with diverse relation types and conduct usability studies with application scenarios to optimize biomedical text mining. METHODS: Data sets containing 125 relation types with different entity semantic levels were constructed to evaluate the impact of entity semantic information on RE, and performance analysis was conducted on different model architectures and domain models. This study also proposed a continued pretraining strategy and integrated models with scripts into a tool. Furthermore, this study applied RE to the COVID-19 corpus with article topics and application scenarios of clinical interest to assess and demonstrate its biological interpretability and usability. RESULTS: The performance analysis revealed that RE achieves the best performance when the detailed semantic type is provided. For a single model, PubMedBERT with continued pretraining performed the best, with an F1-score of 0.8998. Usability studies on COVID-19 demonstrated the interpretability and usability of RE, and a relation graph database was constructed, which was used to reveal existing and novel drug paths with edge explanations. The models (including pretrained and fine-tuned models), integrated tool (Docker), and generated data (including the COVID-19 relation graph database and drug paths) have been made publicly available to the biomedical text mining community and clinical researchers. CONCLUSIONS: This study provided a comprehensive analysis of RE with diverse relation types. Optimized RE models and tools for diverse relation types were developed, which can be widely used in biomedical text mining. Our usability studies provided a proof-of-concept demonstration of how large-scale RE can be leveraged to facilitate novel research.
format Online
Article
Text
id pubmed-10551783
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-105517832023-10-06 Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19 Zhang, Zeyu Fang, Meng Wu, Rebecca Zong, Hui Huang, Honglian Tong, Yuantao Xie, Yujia Cheng, Shiyang Wei, Ziyi Crabbe, M James C Zhang, Xiaoyan Wang, Ying J Med Internet Res Original Paper BACKGROUND: Biomedical relation extraction (RE) is of great importance for researchers to conduct systematic biomedical studies. It not only helps knowledge mining, such as knowledge graphs and novel knowledge discovery, but also promotes translational applications, such as clinical diagnosis, decision-making, and precision medicine. However, the relations between biomedical entities are complex and diverse, and comprehensive biomedical RE is not yet well established. OBJECTIVE: We aimed to investigate and improve large-scale RE with diverse relation types and conduct usability studies with application scenarios to optimize biomedical text mining. METHODS: Data sets containing 125 relation types with different entity semantic levels were constructed to evaluate the impact of entity semantic information on RE, and performance analysis was conducted on different model architectures and domain models. This study also proposed a continued pretraining strategy and integrated models with scripts into a tool. Furthermore, this study applied RE to the COVID-19 corpus with article topics and application scenarios of clinical interest to assess and demonstrate its biological interpretability and usability. RESULTS: The performance analysis revealed that RE achieves the best performance when the detailed semantic type is provided. For a single model, PubMedBERT with continued pretraining performed the best, with an F1-score of 0.8998. Usability studies on COVID-19 demonstrated the interpretability and usability of RE, and a relation graph database was constructed, which was used to reveal existing and novel drug paths with edge explanations. The models (including pretrained and fine-tuned models), integrated tool (Docker), and generated data (including the COVID-19 relation graph database and drug paths) have been made publicly available to the biomedical text mining community and clinical researchers. CONCLUSIONS: This study provided a comprehensive analysis of RE with diverse relation types. Optimized RE models and tools for diverse relation types were developed, which can be widely used in biomedical text mining. Our usability studies provided a proof-of-concept demonstration of how large-scale RE can be leveraged to facilitate novel research. JMIR Publications 2023-09-20 /pmc/articles/PMC10551783/ /pubmed/37632414 http://dx.doi.org/10.2196/48115 Text en ©Zeyu Zhang, Meng Fang, Rebecca Wu, Hui Zong, Honglian Huang, Yuantao Tong, Yujia Xie, Shiyang Cheng, Ziyi Wei, M James C Crabbe, Xiaoyan Zhang, Ying Wang. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 20.09.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Zhang, Zeyu
Fang, Meng
Wu, Rebecca
Zong, Hui
Huang, Honglian
Tong, Yuantao
Xie, Yujia
Cheng, Shiyang
Wei, Ziyi
Crabbe, M James C
Zhang, Xiaoyan
Wang, Ying
Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19
title Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19
title_full Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19
title_fullStr Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19
title_full_unstemmed Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19
title_short Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19
title_sort large-scale biomedical relation extraction across diverse relation types: model development and usability study on covid-19
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10551783/
https://www.ncbi.nlm.nih.gov/pubmed/37632414
http://dx.doi.org/10.2196/48115
work_keys_str_mv AT zhangzeyu largescalebiomedicalrelationextractionacrossdiverserelationtypesmodeldevelopmentandusabilitystudyoncovid19
AT fangmeng largescalebiomedicalrelationextractionacrossdiverserelationtypesmodeldevelopmentandusabilitystudyoncovid19
AT wurebecca largescalebiomedicalrelationextractionacrossdiverserelationtypesmodeldevelopmentandusabilitystudyoncovid19
AT zonghui largescalebiomedicalrelationextractionacrossdiverserelationtypesmodeldevelopmentandusabilitystudyoncovid19
AT huanghonglian largescalebiomedicalrelationextractionacrossdiverserelationtypesmodeldevelopmentandusabilitystudyoncovid19
AT tongyuantao largescalebiomedicalrelationextractionacrossdiverserelationtypesmodeldevelopmentandusabilitystudyoncovid19
AT xieyujia largescalebiomedicalrelationextractionacrossdiverserelationtypesmodeldevelopmentandusabilitystudyoncovid19
AT chengshiyang largescalebiomedicalrelationextractionacrossdiverserelationtypesmodeldevelopmentandusabilitystudyoncovid19
AT weiziyi largescalebiomedicalrelationextractionacrossdiverserelationtypesmodeldevelopmentandusabilitystudyoncovid19
AT crabbemjamesc largescalebiomedicalrelationextractionacrossdiverserelationtypesmodeldevelopmentandusabilitystudyoncovid19
AT zhangxiaoyan largescalebiomedicalrelationextractionacrossdiverserelationtypesmodeldevelopmentandusabilitystudyoncovid19
AT wangying largescalebiomedicalrelationextractionacrossdiverserelationtypesmodeldevelopmentandusabilitystudyoncovid19