Cargando…

MLNet: a multi-level multimodal named entity recognition architecture

In the field of human–computer interaction, accurate identification of talking objects can help robots to accomplish subsequent tasks such as decision-making or recommendation; therefore, object determination is of great interest as a pre-requisite task. Whether it is named entity recognition (NER)...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhai, Hanming, Lv, Xiaojun, Hou, Zhiwen, Tong, Xin, Bu, Fanliang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Neuroscience
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10319056/ https://www.ncbi.nlm.nih.gov/pubmed/37408584 http://dx.doi.org/10.3389/fnbot.2023.1181143

_version_	1785068163574005760
author	Zhai, Hanming Lv, Xiaojun Hou, Zhiwen Tong, Xin Bu, Fanliang
author_facet	Zhai, Hanming Lv, Xiaojun Hou, Zhiwen Tong, Xin Bu, Fanliang
author_sort	Zhai, Hanming
collection	PubMed
description	In the field of human–computer interaction, accurate identification of talking objects can help robots to accomplish subsequent tasks such as decision-making or recommendation; therefore, object determination is of great interest as a pre-requisite task. Whether it is named entity recognition (NER) in natural language processing (NLP) work or object detection (OD) task in the computer vision (CV) field, the essence is to achieve object recognition. Currently, multimodal approaches are widely used in basic image recognition and natural language processing tasks. This multimodal architecture can perform entity recognition tasks more accurately, but when faced with short texts and images containing more noise, we find that there is still room for optimization in the image-text-based multimodal named entity recognition (MNER) architecture. In this study, we propose a new multi-level multimodal named entity recognition architecture, which is a network capable of extracting useful visual information for boosting semantic understanding and subsequently improving entity identification efficacy. Specifically, we first performed image and text encoding separately and then built a symmetric neural network architecture based on Transformer for multimodal feature fusion. We utilized a gating mechanism to filter visual information that is significantly related to the textual content, in order to enhance text understanding and achieve semantic disambiguation. Furthermore, we incorporated character-level vector encoding to reduce text noise. Finally, we employed Conditional Random Fields for label classification task. Experiments on the Twitter dataset show that our model works to increase the accuracy of the MNER task.
format	Online Article Text
id	pubmed-10319056
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-103190562023-07-05 MLNet: a multi-level multimodal named entity recognition architecture Zhai, Hanming Lv, Xiaojun Hou, Zhiwen Tong, Xin Bu, Fanliang Front Neurorobot Neuroscience In the field of human–computer interaction, accurate identification of talking objects can help robots to accomplish subsequent tasks such as decision-making or recommendation; therefore, object determination is of great interest as a pre-requisite task. Whether it is named entity recognition (NER) in natural language processing (NLP) work or object detection (OD) task in the computer vision (CV) field, the essence is to achieve object recognition. Currently, multimodal approaches are widely used in basic image recognition and natural language processing tasks. This multimodal architecture can perform entity recognition tasks more accurately, but when faced with short texts and images containing more noise, we find that there is still room for optimization in the image-text-based multimodal named entity recognition (MNER) architecture. In this study, we propose a new multi-level multimodal named entity recognition architecture, which is a network capable of extracting useful visual information for boosting semantic understanding and subsequently improving entity identification efficacy. Specifically, we first performed image and text encoding separately and then built a symmetric neural network architecture based on Transformer for multimodal feature fusion. We utilized a gating mechanism to filter visual information that is significantly related to the textual content, in order to enhance text understanding and achieve semantic disambiguation. Furthermore, we incorporated character-level vector encoding to reduce text noise. Finally, we employed Conditional Random Fields for label classification task. Experiments on the Twitter dataset show that our model works to increase the accuracy of the MNER task. Frontiers Media S.A. 2023-06-20 /pmc/articles/PMC10319056/ /pubmed/37408584 http://dx.doi.org/10.3389/fnbot.2023.1181143 Text en Copyright © 2023 Zhai, Lv, Hou, Tong and Bu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Neuroscience Zhai, Hanming Lv, Xiaojun Hou, Zhiwen Tong, Xin Bu, Fanliang MLNet: a multi-level multimodal named entity recognition architecture
title	MLNet: a multi-level multimodal named entity recognition architecture
title_full	MLNet: a multi-level multimodal named entity recognition architecture
title_fullStr	MLNet: a multi-level multimodal named entity recognition architecture
title_full_unstemmed	MLNet: a multi-level multimodal named entity recognition architecture
title_short	MLNet: a multi-level multimodal named entity recognition architecture
title_sort	mlnet: a multi-level multimodal named entity recognition architecture
topic	Neuroscience
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10319056/ https://www.ncbi.nlm.nih.gov/pubmed/37408584 http://dx.doi.org/10.3389/fnbot.2023.1181143
work_keys_str_mv	AT zhaihanming mlnetamultilevelmultimodalnamedentityrecognitionarchitecture AT lvxiaojun mlnetamultilevelmultimodalnamedentityrecognitionarchitecture AT houzhiwen mlnetamultilevelmultimodalnamedentityrecognitionarchitecture AT tongxin mlnetamultilevelmultimodalnamedentityrecognitionarchitecture AT bufanliang mlnetamultilevelmultimodalnamedentityrecognitionarchitecture

MLNet: a multi-level multimodal named entity recognition architecture

Ejemplares similares