Cargando…
Multi-label topic classification for COVID-19 literature with Bioformer
We describe Bioformer team’s participation in the multi-label topic classification task for COVID-19 literature (track 5 of BioCreative VII). Topic classification is performed using different BERT models (BioBERT, PubMedBERT, and Bioformer). We formulate the topic classification task as a sentence p...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cornell University
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9016643/ https://www.ncbi.nlm.nih.gov/pubmed/35441084 |
_version_ | 1784688570519257088 |
---|---|
author | Fang, Li Wang, Kai |
author_facet | Fang, Li Wang, Kai |
author_sort | Fang, Li |
collection | PubMed |
description | We describe Bioformer team’s participation in the multi-label topic classification task for COVID-19 literature (track 5 of BioCreative VII). Topic classification is performed using different BERT models (BioBERT, PubMedBERT, and Bioformer). We formulate the topic classification task as a sentence pair classification problem, where the title is the first sentence, and the abstract is the second sentence. Our results show that Bioformer outperforms BioBERT and PubMedBERT in this task. Compared to the baseline results, our best model increased micro, macro, and instance-based F1 score by 8.8%, 15.5%, 7.4%, respectively. Bioformer achieved the highest micro F1 and macro F1 scores in this challenge. In post-challenge experiments, we found that pretraining of Bioformer on COVID-19 articles further improves the performance. |
format | Online Article Text |
id | pubmed-9016643 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Cornell University |
record_format | MEDLINE/PubMed |
spelling | pubmed-90166432022-04-20 Multi-label topic classification for COVID-19 literature with Bioformer Fang, Li Wang, Kai ArXiv Article We describe Bioformer team’s participation in the multi-label topic classification task for COVID-19 literature (track 5 of BioCreative VII). Topic classification is performed using different BERT models (BioBERT, PubMedBERT, and Bioformer). We formulate the topic classification task as a sentence pair classification problem, where the title is the first sentence, and the abstract is the second sentence. Our results show that Bioformer outperforms BioBERT and PubMedBERT in this task. Compared to the baseline results, our best model increased micro, macro, and instance-based F1 score by 8.8%, 15.5%, 7.4%, respectively. Bioformer achieved the highest micro F1 and macro F1 scores in this challenge. In post-challenge experiments, we found that pretraining of Bioformer on COVID-19 articles further improves the performance. Cornell University 2022-04-14 /pmc/articles/PMC9016643/ /pubmed/35441084 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Fang, Li Wang, Kai Multi-label topic classification for COVID-19 literature with Bioformer |
title | Multi-label topic classification for COVID-19 literature with Bioformer |
title_full | Multi-label topic classification for COVID-19 literature with Bioformer |
title_fullStr | Multi-label topic classification for COVID-19 literature with Bioformer |
title_full_unstemmed | Multi-label topic classification for COVID-19 literature with Bioformer |
title_short | Multi-label topic classification for COVID-19 literature with Bioformer |
title_sort | multi-label topic classification for covid-19 literature with bioformer |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9016643/ https://www.ncbi.nlm.nih.gov/pubmed/35441084 |
work_keys_str_mv | AT fangli multilabeltopicclassificationforcovid19literaturewithbioformer AT wangkai multilabeltopicclassificationforcovid19literaturewithbioformer |