Cargando…

srBERT: automatic article classification model for systematic review using BERT

BACKGROUND: Systematic reviews (SRs) are recognized as reliable evidence, which enables evidence-based medicine to be applied to clinical practice. However, owing to the significant efforts required for an SR, its creation is time-consuming, which often leads to out-of-date results. To support SR ta...

Descripción completa

Detalles Bibliográficos
Autores principales:	Aum, Sungmin, Choe, Seon
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8556883/ https://www.ncbi.nlm.nih.gov/pubmed/34717768 http://dx.doi.org/10.1186/s13643-021-01763-w

_version_	1784592263208239104
author	Aum, Sungmin Choe, Seon
author_facet	Aum, Sungmin Choe, Seon
author_sort	Aum, Sungmin
collection	PubMed
description	BACKGROUND: Systematic reviews (SRs) are recognized as reliable evidence, which enables evidence-based medicine to be applied to clinical practice. However, owing to the significant efforts required for an SR, its creation is time-consuming, which often leads to out-of-date results. To support SR tasks, tools for automating these SR tasks have been considered; however, applying a general natural language processing model to domain-specific articles and insufficient text data for training poses challenges. METHODS: The research objective is to automate the classification of included articles using the Bidirectional Encoder Representations from Transformers (BERT) algorithm. In particular, srBERT models based on the BERT algorithm are pre-trained using abstracts of articles from two types of datasets, and the resulting model is then fine-tuned using the article titles. The performances of our proposed models are compared with those of existing general machine-learning models. RESULTS: Our results indicate that the proposed srBERT(my) model, pre-trained with abstracts of articles and a generated vocabulary, achieved state-of-the-art performance in both classification and relation-extraction tasks; for the first task, it achieved an accuracy of 94.35% (89.38%), F1 score of 66.12 (78.64), and area under the receiver operating characteristic curve of 0.77 (0.9) on the original and (generated) datasets, respectively. In the second task, the model achieved an accuracy of 93.5% with a loss of 27%, thereby outperforming the other evaluated models, including the original BERT model. CONCLUSIONS: Our research shows the possibility of automatic article classification using machine-learning approaches to support SR tasks and its broad applicability. However, because the performance of our model depends on the size and class ratio of the training dataset, it is important to secure a dataset of sufficient quality, which may pose challenges. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13643-021-01763-w.
format	Online Article Text
id	pubmed-8556883
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-85568832021-11-01 srBERT: automatic article classification model for systematic review using BERT Aum, Sungmin Choe, Seon Syst Rev Methodology BACKGROUND: Systematic reviews (SRs) are recognized as reliable evidence, which enables evidence-based medicine to be applied to clinical practice. However, owing to the significant efforts required for an SR, its creation is time-consuming, which often leads to out-of-date results. To support SR tasks, tools for automating these SR tasks have been considered; however, applying a general natural language processing model to domain-specific articles and insufficient text data for training poses challenges. METHODS: The research objective is to automate the classification of included articles using the Bidirectional Encoder Representations from Transformers (BERT) algorithm. In particular, srBERT models based on the BERT algorithm are pre-trained using abstracts of articles from two types of datasets, and the resulting model is then fine-tuned using the article titles. The performances of our proposed models are compared with those of existing general machine-learning models. RESULTS: Our results indicate that the proposed srBERT(my) model, pre-trained with abstracts of articles and a generated vocabulary, achieved state-of-the-art performance in both classification and relation-extraction tasks; for the first task, it achieved an accuracy of 94.35% (89.38%), F1 score of 66.12 (78.64), and area under the receiver operating characteristic curve of 0.77 (0.9) on the original and (generated) datasets, respectively. In the second task, the model achieved an accuracy of 93.5% with a loss of 27%, thereby outperforming the other evaluated models, including the original BERT model. CONCLUSIONS: Our research shows the possibility of automatic article classification using machine-learning approaches to support SR tasks and its broad applicability. However, because the performance of our model depends on the size and class ratio of the training dataset, it is important to secure a dataset of sufficient quality, which may pose challenges. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13643-021-01763-w. BioMed Central 2021-10-30 /pmc/articles/PMC8556883/ /pubmed/34717768 http://dx.doi.org/10.1186/s13643-021-01763-w Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Aum, Sungmin Choe, Seon srBERT: automatic article classification model for systematic review using BERT
title	srBERT: automatic article classification model for systematic review using BERT
title_full	srBERT: automatic article classification model for systematic review using BERT
title_fullStr	srBERT: automatic article classification model for systematic review using BERT
title_full_unstemmed	srBERT: automatic article classification model for systematic review using BERT
title_short	srBERT: automatic article classification model for systematic review using BERT
title_sort	srbert: automatic article classification model for systematic review using bert
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8556883/ https://www.ncbi.nlm.nih.gov/pubmed/34717768 http://dx.doi.org/10.1186/s13643-021-01763-w
work_keys_str_mv	AT aumsungmin srbertautomaticarticleclassificationmodelforsystematicreviewusingbert AT choeseon srbertautomaticarticleclassificationmodelforsystematicreviewusingbert

srBERT: automatic article classification model for systematic review using BERT

Ejemplares similares