Cargando…
A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles
In this research, we explored various state-of-the-art biomedical-specific pre-trained Bidirectional Encoder Representations from Transformers (BERT) models for the National Library of Medicine - Chemistry (NLM CHEM) and LitCovid tracks in the BioCreative VII Challenge, and propose a BERT-based ense...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9290865/ https://www.ncbi.nlm.nih.gov/pubmed/35849027 http://dx.doi.org/10.1093/database/baac056 |
_version_ | 1784749009524490240 |
---|---|
author | Lin, Sheng-Jie Yeh, Wen-Chao Chiu, Yu-Wen Chang, Yung-Chun Hsu, Min-Huei Chen, Yi-Shin Hsu, Wen-Lian |
author_facet | Lin, Sheng-Jie Yeh, Wen-Chao Chiu, Yu-Wen Chang, Yung-Chun Hsu, Min-Huei Chen, Yi-Shin Hsu, Wen-Lian |
author_sort | Lin, Sheng-Jie |
collection | PubMed |
description | In this research, we explored various state-of-the-art biomedical-specific pre-trained Bidirectional Encoder Representations from Transformers (BERT) models for the National Library of Medicine - Chemistry (NLM CHEM) and LitCovid tracks in the BioCreative VII Challenge, and propose a BERT-based ensemble learning approach to integrate the advantages of various models to improve the system’s performance. The experimental results of the NLM-CHEM track demonstrate that our method can achieve remarkable performance, with F(1)-scores of 85% and 91.8% in strict and approximate evaluations, respectively. Moreover, the proposed Medical Subject Headings identifier (MeSH ID) normalization algorithm is effective in entity normalization, which achieved a F(1)-score of about 80% in both strict and approximate evaluations. For the LitCovid track, the proposed method is also effective in detecting topics in the Coronavirus disease 2019 (COVID-19) literature, which outperformed the compared methods and achieve state-of-the-art performance in the LitCovid corpus. Database URL: https://www.ncbi.nlm.nih.gov/research/coronavirus/. |
format | Online Article Text |
id | pubmed-9290865 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-92908652022-07-18 A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles Lin, Sheng-Jie Yeh, Wen-Chao Chiu, Yu-Wen Chang, Yung-Chun Hsu, Min-Huei Chen, Yi-Shin Hsu, Wen-Lian Database (Oxford) Original Article In this research, we explored various state-of-the-art biomedical-specific pre-trained Bidirectional Encoder Representations from Transformers (BERT) models for the National Library of Medicine - Chemistry (NLM CHEM) and LitCovid tracks in the BioCreative VII Challenge, and propose a BERT-based ensemble learning approach to integrate the advantages of various models to improve the system’s performance. The experimental results of the NLM-CHEM track demonstrate that our method can achieve remarkable performance, with F(1)-scores of 85% and 91.8% in strict and approximate evaluations, respectively. Moreover, the proposed Medical Subject Headings identifier (MeSH ID) normalization algorithm is effective in entity normalization, which achieved a F(1)-score of about 80% in both strict and approximate evaluations. For the LitCovid track, the proposed method is also effective in detecting topics in the Coronavirus disease 2019 (COVID-19) literature, which outperformed the compared methods and achieve state-of-the-art performance in the LitCovid corpus. Database URL: https://www.ncbi.nlm.nih.gov/research/coronavirus/. Oxford University Press 2022-07-15 /pmc/articles/PMC9290865/ /pubmed/35849027 http://dx.doi.org/10.1093/database/baac056 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Article Lin, Sheng-Jie Yeh, Wen-Chao Chiu, Yu-Wen Chang, Yung-Chun Hsu, Min-Huei Chen, Yi-Shin Hsu, Wen-Lian A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles |
title | A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles |
title_full | A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles |
title_fullStr | A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles |
title_full_unstemmed | A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles |
title_short | A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles |
title_sort | bert-based ensemble learning approach for the biocreative vii challenges: full-text chemical identification and multi-label classification in pubmed articles |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9290865/ https://www.ncbi.nlm.nih.gov/pubmed/35849027 http://dx.doi.org/10.1093/database/baac056 |
work_keys_str_mv | AT linshengjie abertbasedensemblelearningapproachforthebiocreativeviichallengesfulltextchemicalidentificationandmultilabelclassificationinpubmedarticles AT yehwenchao abertbasedensemblelearningapproachforthebiocreativeviichallengesfulltextchemicalidentificationandmultilabelclassificationinpubmedarticles AT chiuyuwen abertbasedensemblelearningapproachforthebiocreativeviichallengesfulltextchemicalidentificationandmultilabelclassificationinpubmedarticles AT changyungchun abertbasedensemblelearningapproachforthebiocreativeviichallengesfulltextchemicalidentificationandmultilabelclassificationinpubmedarticles AT hsuminhuei abertbasedensemblelearningapproachforthebiocreativeviichallengesfulltextchemicalidentificationandmultilabelclassificationinpubmedarticles AT chenyishin abertbasedensemblelearningapproachforthebiocreativeviichallengesfulltextchemicalidentificationandmultilabelclassificationinpubmedarticles AT hsuwenlian abertbasedensemblelearningapproachforthebiocreativeviichallengesfulltextchemicalidentificationandmultilabelclassificationinpubmedarticles AT linshengjie bertbasedensemblelearningapproachforthebiocreativeviichallengesfulltextchemicalidentificationandmultilabelclassificationinpubmedarticles AT yehwenchao bertbasedensemblelearningapproachforthebiocreativeviichallengesfulltextchemicalidentificationandmultilabelclassificationinpubmedarticles AT chiuyuwen bertbasedensemblelearningapproachforthebiocreativeviichallengesfulltextchemicalidentificationandmultilabelclassificationinpubmedarticles AT changyungchun bertbasedensemblelearningapproachforthebiocreativeviichallengesfulltextchemicalidentificationandmultilabelclassificationinpubmedarticles AT hsuminhuei bertbasedensemblelearningapproachforthebiocreativeviichallengesfulltextchemicalidentificationandmultilabelclassificationinpubmedarticles AT chenyishin bertbasedensemblelearningapproachforthebiocreativeviichallengesfulltextchemicalidentificationandmultilabelclassificationinpubmedarticles AT hsuwenlian bertbasedensemblelearningapproachforthebiocreativeviichallengesfulltextchemicalidentificationandmultilabelclassificationinpubmedarticles |