Cargando…

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation

BACKGROUND: Recently, food science has been garnering a lot of attention. There are many open research questions on food interactions, as one of the main environmental factors, with other health-related entities such as diseases, treatments, and drugs. In the last 2 decades, a large amount of work h...

Descripción completa

Detalles Bibliográficos
Autores principales:	Stojanov, Riste, Popovski, Gorjan, Cenikj, Gjorgjina, Koroušić Seljak, Barbara, Eftimov, Tome
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2021
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8415558/ https://www.ncbi.nlm.nih.gov/pubmed/34383671 http://dx.doi.org/10.2196/28229

_version_	1783747989528903680
author	Stojanov, Riste Popovski, Gorjan Cenikj, Gjorgjina Koroušić Seljak, Barbara Eftimov, Tome
author_facet	Stojanov, Riste Popovski, Gorjan Cenikj, Gjorgjina Koroušić Seljak, Barbara Eftimov, Tome
author_sort	Stojanov, Riste
collection	PubMed
description	BACKGROUND: Recently, food science has been garnering a lot of attention. There are many open research questions on food interactions, as one of the main environmental factors, with other health-related entities such as diseases, treatments, and drugs. In the last 2 decades, a large amount of work has been done in natural language processing and machine learning to enable biomedical information extraction. However, machine learning in food science domains remains inadequately resourced, which brings to attention the problem of developing methods for food information extraction. There are only few food semantic resources and few rule-based methods for food information extraction, which often depend on some external resources. However, an annotated corpus with food entities along with their normalization was published in 2019 by using several food semantic resources. OBJECTIVE: In this study, we investigated how the recently published bidirectional encoder representations from transformers (BERT) model, which provides state-of-the-art results in information extraction, can be fine-tuned for food information extraction. METHODS: We introduce FoodNER, which is a collection of corpus-based food named-entity recognition methods. It consists of 15 different models obtained by fine-tuning 3 pretrained BERT models on 5 groups of semantic resources: food versus nonfood entity, 2 subsets of Hansard food semantic tags, FoodOn semantic tags, and Systematized Nomenclature of Medicine Clinical Terms food semantic tags. RESULTS: All BERT models provided very promising results with 93.30% to 94.31% macro F1 scores in the task of distinguishing food versus nonfood entity, which represents the new state-of-the-art technology in food information extraction. Considering the tasks where semantic tags are predicted, all BERT models obtained very promising results once again, with their macro F1 scores ranging from 73.39% to 78.96%. CONCLUSIONS: FoodNER can be used to extract and annotate food entities in 5 different tasks: food versus nonfood entities and distinguishing food entities on the level of food groups by using the closest Hansard semantic tags, the parent Hansard semantic tags, the FoodOn semantic tags, or the Systematized Nomenclature of Medicine Clinical Terms semantic tags.
format	Online Article Text
id	pubmed-8415558
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-84155582021-09-24 A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation Stojanov, Riste Popovski, Gorjan Cenikj, Gjorgjina Koroušić Seljak, Barbara Eftimov, Tome J Med Internet Res Original Paper BACKGROUND: Recently, food science has been garnering a lot of attention. There are many open research questions on food interactions, as one of the main environmental factors, with other health-related entities such as diseases, treatments, and drugs. In the last 2 decades, a large amount of work has been done in natural language processing and machine learning to enable biomedical information extraction. However, machine learning in food science domains remains inadequately resourced, which brings to attention the problem of developing methods for food information extraction. There are only few food semantic resources and few rule-based methods for food information extraction, which often depend on some external resources. However, an annotated corpus with food entities along with their normalization was published in 2019 by using several food semantic resources. OBJECTIVE: In this study, we investigated how the recently published bidirectional encoder representations from transformers (BERT) model, which provides state-of-the-art results in information extraction, can be fine-tuned for food information extraction. METHODS: We introduce FoodNER, which is a collection of corpus-based food named-entity recognition methods. It consists of 15 different models obtained by fine-tuning 3 pretrained BERT models on 5 groups of semantic resources: food versus nonfood entity, 2 subsets of Hansard food semantic tags, FoodOn semantic tags, and Systematized Nomenclature of Medicine Clinical Terms food semantic tags. RESULTS: All BERT models provided very promising results with 93.30% to 94.31% macro F1 scores in the task of distinguishing food versus nonfood entity, which represents the new state-of-the-art technology in food information extraction. Considering the tasks where semantic tags are predicted, all BERT models obtained very promising results once again, with their macro F1 scores ranging from 73.39% to 78.96%. CONCLUSIONS: FoodNER can be used to extract and annotate food entities in 5 different tasks: food versus nonfood entities and distinguishing food entities on the level of food groups by using the closest Hansard semantic tags, the parent Hansard semantic tags, the FoodOn semantic tags, or the Systematized Nomenclature of Medicine Clinical Terms semantic tags. JMIR Publications 2021-08-09 /pmc/articles/PMC8415558/ /pubmed/34383671 http://dx.doi.org/10.2196/28229 Text en ©Riste Stojanov, Gorjan Popovski, Gjorgjina Cenikj, Barbara Koroušić Seljak, Tome Eftimov. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 09.08.2021. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Stojanov, Riste Popovski, Gorjan Cenikj, Gjorgjina Koroušić Seljak, Barbara Eftimov, Tome A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation
title	A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation
title_full	A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation
title_fullStr	A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation
title_full_unstemmed	A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation
title_short	A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation
title_sort	fine-tuned bidirectional encoder representations from transformers model for food named-entity recognition: algorithm development and validation
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8415558/ https://www.ncbi.nlm.nih.gov/pubmed/34383671 http://dx.doi.org/10.2196/28229
work_keys_str_mv	AT stojanovriste afinetunedbidirectionalencoderrepresentationsfromtransformersmodelforfoodnamedentityrecognitionalgorithmdevelopmentandvalidation AT popovskigorjan afinetunedbidirectionalencoderrepresentationsfromtransformersmodelforfoodnamedentityrecognitionalgorithmdevelopmentandvalidation AT cenikjgjorgjina afinetunedbidirectionalencoderrepresentationsfromtransformersmodelforfoodnamedentityrecognitionalgorithmdevelopmentandvalidation AT korousicseljakbarbara afinetunedbidirectionalencoderrepresentationsfromtransformersmodelforfoodnamedentityrecognitionalgorithmdevelopmentandvalidation AT eftimovtome afinetunedbidirectionalencoderrepresentationsfromtransformersmodelforfoodnamedentityrecognitionalgorithmdevelopmentandvalidation AT stojanovriste finetunedbidirectionalencoderrepresentationsfromtransformersmodelforfoodnamedentityrecognitionalgorithmdevelopmentandvalidation AT popovskigorjan finetunedbidirectionalencoderrepresentationsfromtransformersmodelforfoodnamedentityrecognitionalgorithmdevelopmentandvalidation AT cenikjgjorgjina finetunedbidirectionalencoderrepresentationsfromtransformersmodelforfoodnamedentityrecognitionalgorithmdevelopmentandvalidation AT korousicseljakbarbara finetunedbidirectionalencoderrepresentationsfromtransformersmodelforfoodnamedentityrecognitionalgorithmdevelopmentandvalidation AT eftimovtome finetunedbidirectionalencoderrepresentationsfromtransformersmodelforfoodnamedentityrecognitionalgorithmdevelopmentandvalidation

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation

Ejemplares similares