Cargando…

A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER

Named entity recognition (NER) systems are often realized by supervised methods that require large hand-annotated data. When the hand-annotated data is limited, distantly supervised (DS) data and cross-domain (CD) data are usually used separately to improve the performance. The distantly supervised...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Yun, He, Hao, Chen, Zhengfei, Zhu, Qingmeng, Zheng, Changwen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9168158/
https://www.ncbi.nlm.nih.gov/pubmed/35676955
http://dx.doi.org/10.1155/2022/1987829
_version_ 1784720939643043840
author Hu, Yun
He, Hao
Chen, Zhengfei
Zhu, Qingmeng
Zheng, Changwen
author_facet Hu, Yun
He, Hao
Chen, Zhengfei
Zhu, Qingmeng
Zheng, Changwen
author_sort Hu, Yun
collection PubMed
description Named entity recognition (NER) systems are often realized by supervised methods that require large hand-annotated data. When the hand-annotated data is limited, distantly supervised (DS) data and cross-domain (CD) data are usually used separately to improve the performance. The distantly supervised data can provide in-domain dictionary information, and the hand-annotated cross-domain information can be provided by cross-domain data. These two types of information are complemental. However, there are two problems required to be solved before using directly. First, the distantly supervised data may contain a lot of noise. Second, directly using cross-domain data may degrade performance due to the distribution mismatching problem. In this paper, we propose a unified model named PARE (PArtial learning and REinforcement learning). The PARE model can simultaneously use distantly supervised data and cross-domain data as external data. The model uses the partial learning method with a new label strategy to better handle the noise in distantly supervised data. The reinforcement learning method is used to alleviate the distribution mismatching problem in cross-domain data. Experiments in three datasets show that our model outperforms other baseline models. Besides, our model can be used in the situation where no hand-annotated in-domain data is provided.
format Online
Article
Text
id pubmed-9168158
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-91681582022-06-07 A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER Hu, Yun He, Hao Chen, Zhengfei Zhu, Qingmeng Zheng, Changwen Comput Intell Neurosci Research Article Named entity recognition (NER) systems are often realized by supervised methods that require large hand-annotated data. When the hand-annotated data is limited, distantly supervised (DS) data and cross-domain (CD) data are usually used separately to improve the performance. The distantly supervised data can provide in-domain dictionary information, and the hand-annotated cross-domain information can be provided by cross-domain data. These two types of information are complemental. However, there are two problems required to be solved before using directly. First, the distantly supervised data may contain a lot of noise. Second, directly using cross-domain data may degrade performance due to the distribution mismatching problem. In this paper, we propose a unified model named PARE (PArtial learning and REinforcement learning). The PARE model can simultaneously use distantly supervised data and cross-domain data as external data. The model uses the partial learning method with a new label strategy to better handle the noise in distantly supervised data. The reinforcement learning method is used to alleviate the distribution mismatching problem in cross-domain data. Experiments in three datasets show that our model outperforms other baseline models. Besides, our model can be used in the situation where no hand-annotated in-domain data is provided. Hindawi 2022-05-29 /pmc/articles/PMC9168158/ /pubmed/35676955 http://dx.doi.org/10.1155/2022/1987829 Text en Copyright © 2022 Yun Hu et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Hu, Yun
He, Hao
Chen, Zhengfei
Zhu, Qingmeng
Zheng, Changwen
A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER
title A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER
title_full A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER
title_fullStr A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER
title_full_unstemmed A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER
title_short A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER
title_sort unified model using distantly supervised data and cross-domain data in ner
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9168158/
https://www.ncbi.nlm.nih.gov/pubmed/35676955
http://dx.doi.org/10.1155/2022/1987829
work_keys_str_mv AT huyun aunifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner
AT hehao aunifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner
AT chenzhengfei aunifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner
AT zhuqingmeng aunifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner
AT zhengchangwen aunifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner
AT huyun unifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner
AT hehao unifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner
AT chenzhengfei unifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner
AT zhuqingmeng unifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner
AT zhengchangwen unifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner