Cargando…

Multi-label topic classification for COVID-19 literature with Bioformer

We describe Bioformer team’s participation in the multi-label topic classification task for COVID-19 literature (track 5 of BioCreative VII). Topic classification is performed using different BERT models (BioBERT, PubMedBERT, and Bioformer). We formulate the topic classification task as a sentence p...

Descripción completa

Detalles Bibliográficos
Autores principales: Fang, Li, Wang, Kai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cornell University 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9016643/
https://www.ncbi.nlm.nih.gov/pubmed/35441084
_version_ 1784688570519257088
author Fang, Li
Wang, Kai
author_facet Fang, Li
Wang, Kai
author_sort Fang, Li
collection PubMed
description We describe Bioformer team’s participation in the multi-label topic classification task for COVID-19 literature (track 5 of BioCreative VII). Topic classification is performed using different BERT models (BioBERT, PubMedBERT, and Bioformer). We formulate the topic classification task as a sentence pair classification problem, where the title is the first sentence, and the abstract is the second sentence. Our results show that Bioformer outperforms BioBERT and PubMedBERT in this task. Compared to the baseline results, our best model increased micro, macro, and instance-based F1 score by 8.8%, 15.5%, 7.4%, respectively. Bioformer achieved the highest micro F1 and macro F1 scores in this challenge. In post-challenge experiments, we found that pretraining of Bioformer on COVID-19 articles further improves the performance.
format Online
Article
Text
id pubmed-9016643
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Cornell University
record_format MEDLINE/PubMed
spelling pubmed-90166432022-04-20 Multi-label topic classification for COVID-19 literature with Bioformer Fang, Li Wang, Kai ArXiv Article We describe Bioformer team’s participation in the multi-label topic classification task for COVID-19 literature (track 5 of BioCreative VII). Topic classification is performed using different BERT models (BioBERT, PubMedBERT, and Bioformer). We formulate the topic classification task as a sentence pair classification problem, where the title is the first sentence, and the abstract is the second sentence. Our results show that Bioformer outperforms BioBERT and PubMedBERT in this task. Compared to the baseline results, our best model increased micro, macro, and instance-based F1 score by 8.8%, 15.5%, 7.4%, respectively. Bioformer achieved the highest micro F1 and macro F1 scores in this challenge. In post-challenge experiments, we found that pretraining of Bioformer on COVID-19 articles further improves the performance. Cornell University 2022-04-14 /pmc/articles/PMC9016643/ /pubmed/35441084 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Fang, Li
Wang, Kai
Multi-label topic classification for COVID-19 literature with Bioformer
title Multi-label topic classification for COVID-19 literature with Bioformer
title_full Multi-label topic classification for COVID-19 literature with Bioformer
title_fullStr Multi-label topic classification for COVID-19 literature with Bioformer
title_full_unstemmed Multi-label topic classification for COVID-19 literature with Bioformer
title_short Multi-label topic classification for COVID-19 literature with Bioformer
title_sort multi-label topic classification for covid-19 literature with bioformer
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9016643/
https://www.ncbi.nlm.nih.gov/pubmed/35441084
work_keys_str_mv AT fangli multilabeltopicclassificationforcovid19literaturewithbioformer
AT wangkai multilabeltopicclassificationforcovid19literaturewithbioformer