Cargando…

CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources

Besides the numerous studies in the last decade involving food and nutrition data, this domain remains low resourced. Annotated corpuses are very useful tools for researchers and experts of the domain in question, as well as for data scientists for analysis. In this paper, we present the annotation...

Descripción completa

Detalles Bibliográficos
Autores principales: Ispirova, Gordana, Cenikj, Gjorgjina, Ogrinc, Matevž, Valenčič, Eva, Stojanov, Riste, Korošec, Peter, Cavalli, Ermanno, Koroušić Seljak, Barbara, Eftimov, Tome
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9455825/
https://www.ncbi.nlm.nih.gov/pubmed/36076868
http://dx.doi.org/10.3390/foods11172684
_version_ 1784785664459407360
author Ispirova, Gordana
Cenikj, Gjorgjina
Ogrinc, Matevž
Valenčič, Eva
Stojanov, Riste
Korošec, Peter
Cavalli, Ermanno
Koroušić Seljak, Barbara
Eftimov, Tome
author_facet Ispirova, Gordana
Cenikj, Gjorgjina
Ogrinc, Matevž
Valenčič, Eva
Stojanov, Riste
Korošec, Peter
Cavalli, Ermanno
Koroušić Seljak, Barbara
Eftimov, Tome
author_sort Ispirova, Gordana
collection PubMed
description Besides the numerous studies in the last decade involving food and nutrition data, this domain remains low resourced. Annotated corpuses are very useful tools for researchers and experts of the domain in question, as well as for data scientists for analysis. In this paper, we present the annotation process of food consumption data (recipes) with semantic tags from different semantic resources—Hansard taxonomy, FoodOn ontology, SNOMED CT terminology and the FoodEx2 classification system. FoodBase is an annotated corpus of food entities—recipes—which includes a curated version of 1000 instances, considered a gold standard. In this study, we use the curated version of FoodBase and two different approaches for annotating—the NCBO annotator (for the FoodOn and SNOMED CT annotations) and the semi-automatic StandFood method (for the FoodEx2 annotations). The end result is a new version of the golden standard of the FoodBase corpus, called the CafeteriaFCD (Cafeteria Food Consumption Data) corpus. This corpus contains food consumption data—recipes—annotated with semantic tags from the aforementioned four different external semantic resources. With these annotations, data interoperability is achieved between five semantic resources from different domains. This resource can be further utilized for developing and training different information extraction pipelines using state-of-the-art NLP approaches for tracing knowledge about food safety applications.
format Online
Article
Text
id pubmed-9455825
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94558252022-09-09 CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources Ispirova, Gordana Cenikj, Gjorgjina Ogrinc, Matevž Valenčič, Eva Stojanov, Riste Korošec, Peter Cavalli, Ermanno Koroušić Seljak, Barbara Eftimov, Tome Foods Article Besides the numerous studies in the last decade involving food and nutrition data, this domain remains low resourced. Annotated corpuses are very useful tools for researchers and experts of the domain in question, as well as for data scientists for analysis. In this paper, we present the annotation process of food consumption data (recipes) with semantic tags from different semantic resources—Hansard taxonomy, FoodOn ontology, SNOMED CT terminology and the FoodEx2 classification system. FoodBase is an annotated corpus of food entities—recipes—which includes a curated version of 1000 instances, considered a gold standard. In this study, we use the curated version of FoodBase and two different approaches for annotating—the NCBO annotator (for the FoodOn and SNOMED CT annotations) and the semi-automatic StandFood method (for the FoodEx2 annotations). The end result is a new version of the golden standard of the FoodBase corpus, called the CafeteriaFCD (Cafeteria Food Consumption Data) corpus. This corpus contains food consumption data—recipes—annotated with semantic tags from the aforementioned four different external semantic resources. With these annotations, data interoperability is achieved between five semantic resources from different domains. This resource can be further utilized for developing and training different information extraction pipelines using state-of-the-art NLP approaches for tracing knowledge about food safety applications. MDPI 2022-09-02 /pmc/articles/PMC9455825/ /pubmed/36076868 http://dx.doi.org/10.3390/foods11172684 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ispirova, Gordana
Cenikj, Gjorgjina
Ogrinc, Matevž
Valenčič, Eva
Stojanov, Riste
Korošec, Peter
Cavalli, Ermanno
Koroušić Seljak, Barbara
Eftimov, Tome
CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources
title CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources
title_full CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources
title_fullStr CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources
title_full_unstemmed CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources
title_short CafeteriaFCD Corpus: Food Consumption Data Annotated with Regard to Different Food Semantic Resources
title_sort cafeteriafcd corpus: food consumption data annotated with regard to different food semantic resources
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9455825/
https://www.ncbi.nlm.nih.gov/pubmed/36076868
http://dx.doi.org/10.3390/foods11172684
work_keys_str_mv AT ispirovagordana cafeteriafcdcorpusfoodconsumptiondataannotatedwithregardtodifferentfoodsemanticresources
AT cenikjgjorgjina cafeteriafcdcorpusfoodconsumptiondataannotatedwithregardtodifferentfoodsemanticresources
AT ogrincmatevz cafeteriafcdcorpusfoodconsumptiondataannotatedwithregardtodifferentfoodsemanticresources
AT valenciceva cafeteriafcdcorpusfoodconsumptiondataannotatedwithregardtodifferentfoodsemanticresources
AT stojanovriste cafeteriafcdcorpusfoodconsumptiondataannotatedwithregardtodifferentfoodsemanticresources
AT korosecpeter cafeteriafcdcorpusfoodconsumptiondataannotatedwithregardtodifferentfoodsemanticresources
AT cavalliermanno cafeteriafcdcorpusfoodconsumptiondataannotatedwithregardtodifferentfoodsemanticresources
AT korousicseljakbarbara cafeteriafcdcorpusfoodconsumptiondataannotatedwithregardtodifferentfoodsemanticresources
AT eftimovtome cafeteriafcdcorpusfoodconsumptiondataannotatedwithregardtodifferentfoodsemanticresources