Cargando…

Improving classification of low-resource COVID-19 literature by using Named Entity Recognition

Automatic document classification for highly interrelated classes is a demanding task that becomes more challenging when there is little labeled data for training. Such is the case of the coronavirus disease 2019 (COVID-19) clinical repository—a repository of classified and translated academic artic...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lithgow-Serrano, Oscar, Cornelius, Joseph, Kanjirangat, Vani, Méndez-Cruz, Carlos-Francisco, Rinaldi, Fabio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Korea Genome Organization 2021
Materias:	Blah7
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510872/ https://www.ncbi.nlm.nih.gov/pubmed/34638169 http://dx.doi.org/10.5808/gi.21018

_version_	1784582665170583552
author	Lithgow-Serrano, Oscar Cornelius, Joseph Kanjirangat, Vani Méndez-Cruz, Carlos-Francisco Rinaldi, Fabio
author_facet	Lithgow-Serrano, Oscar Cornelius, Joseph Kanjirangat, Vani Méndez-Cruz, Carlos-Francisco Rinaldi, Fabio
author_sort	Lithgow-Serrano, Oscar
collection	PubMed
description	Automatic document classification for highly interrelated classes is a demanding task that becomes more challenging when there is little labeled data for training. Such is the case of the coronavirus disease 2019 (COVID-19) clinical repository—a repository of classified and translated academic articles related to COVID-19 and relevant to the clinical practice—where a 3-way classification scheme is being applied to COVID-19 literature. During the 7th Biomedical Linked Annotation Hackathon (BLAH7) hackathon, we performed experiments to explore the use of named-entity-recognition (NER) to improve the classification. We processed the literature with OntoGene’s Biomedical Entity Recogniser (OGER) and used the resulting identified Named Entities (NE) and their links to major biological databases as extra input features for the classifier. We compared the results with a baseline model without the OGER extracted features. In these proof-of-concept experiments, we observed a clear gain on COVID-19 literature classification. In particular, NE’s origin was useful to classify document types and NE’s type for clinical specialties. Due to the limitations of the small dataset, we can only conclude that our results suggests that NER would benefit this classification task. In order to accurately estimate this benefit, further experiments with a larger dataset would be needed.
format	Online Article Text
id	pubmed-8510872
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Korea Genome Organization
record_format	MEDLINE/PubMed
spelling	pubmed-85108722021-10-22 Improving classification of low-resource COVID-19 literature by using Named Entity Recognition Lithgow-Serrano, Oscar Cornelius, Joseph Kanjirangat, Vani Méndez-Cruz, Carlos-Francisco Rinaldi, Fabio Genomics Inform Blah7 Automatic document classification for highly interrelated classes is a demanding task that becomes more challenging when there is little labeled data for training. Such is the case of the coronavirus disease 2019 (COVID-19) clinical repository—a repository of classified and translated academic articles related to COVID-19 and relevant to the clinical practice—where a 3-way classification scheme is being applied to COVID-19 literature. During the 7th Biomedical Linked Annotation Hackathon (BLAH7) hackathon, we performed experiments to explore the use of named-entity-recognition (NER) to improve the classification. We processed the literature with OntoGene’s Biomedical Entity Recogniser (OGER) and used the resulting identified Named Entities (NE) and their links to major biological databases as extra input features for the classifier. We compared the results with a baseline model without the OGER extracted features. In these proof-of-concept experiments, we observed a clear gain on COVID-19 literature classification. In particular, NE’s origin was useful to classify document types and NE’s type for clinical specialties. Due to the limitations of the small dataset, we can only conclude that our results suggests that NER would benefit this classification task. In order to accurately estimate this benefit, further experiments with a larger dataset would be needed. Korea Genome Organization 2021-09-30 /pmc/articles/PMC8510872/ /pubmed/34638169 http://dx.doi.org/10.5808/gi.21018 Text en (c) 2021, Korea Genome Organization https://creativecommons.org/licenses/by/4.0/(CC) This is an open-access article distributed under the terms of the Creative Commons Attribution license(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Blah7 Lithgow-Serrano, Oscar Cornelius, Joseph Kanjirangat, Vani Méndez-Cruz, Carlos-Francisco Rinaldi, Fabio Improving classification of low-resource COVID-19 literature by using Named Entity Recognition
title	Improving classification of low-resource COVID-19 literature by using Named Entity Recognition
title_full	Improving classification of low-resource COVID-19 literature by using Named Entity Recognition
title_fullStr	Improving classification of low-resource COVID-19 literature by using Named Entity Recognition
title_full_unstemmed	Improving classification of low-resource COVID-19 literature by using Named Entity Recognition
title_short	Improving classification of low-resource COVID-19 literature by using Named Entity Recognition
title_sort	improving classification of low-resource covid-19 literature by using named entity recognition
topic	Blah7
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8510872/ https://www.ncbi.nlm.nih.gov/pubmed/34638169 http://dx.doi.org/10.5808/gi.21018
work_keys_str_mv	AT lithgowserranooscar improvingclassificationoflowresourcecovid19literaturebyusingnamedentityrecognition AT corneliusjoseph improvingclassificationoflowresourcecovid19literaturebyusingnamedentityrecognition AT kanjirangatvani improvingclassificationoflowresourcecovid19literaturebyusingnamedentityrecognition AT mendezcruzcarlosfrancisco improvingclassificationoflowresourcecovid19literaturebyusingnamedentityrecognition AT rinaldifabio improvingclassificationoflowresourcecovid19literaturebyusingnamedentityrecognition

Improving classification of low-resource COVID-19 literature by using Named Entity Recognition

Ejemplares similares