Cargando…
A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER
Named entity recognition (NER) systems are often realized by supervised methods that require large hand-annotated data. When the hand-annotated data is limited, distantly supervised (DS) data and cross-domain (CD) data are usually used separately to improve the performance. The distantly supervised...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9168158/ https://www.ncbi.nlm.nih.gov/pubmed/35676955 http://dx.doi.org/10.1155/2022/1987829 |
_version_ | 1784720939643043840 |
---|---|
author | Hu, Yun He, Hao Chen, Zhengfei Zhu, Qingmeng Zheng, Changwen |
author_facet | Hu, Yun He, Hao Chen, Zhengfei Zhu, Qingmeng Zheng, Changwen |
author_sort | Hu, Yun |
collection | PubMed |
description | Named entity recognition (NER) systems are often realized by supervised methods that require large hand-annotated data. When the hand-annotated data is limited, distantly supervised (DS) data and cross-domain (CD) data are usually used separately to improve the performance. The distantly supervised data can provide in-domain dictionary information, and the hand-annotated cross-domain information can be provided by cross-domain data. These two types of information are complemental. However, there are two problems required to be solved before using directly. First, the distantly supervised data may contain a lot of noise. Second, directly using cross-domain data may degrade performance due to the distribution mismatching problem. In this paper, we propose a unified model named PARE (PArtial learning and REinforcement learning). The PARE model can simultaneously use distantly supervised data and cross-domain data as external data. The model uses the partial learning method with a new label strategy to better handle the noise in distantly supervised data. The reinforcement learning method is used to alleviate the distribution mismatching problem in cross-domain data. Experiments in three datasets show that our model outperforms other baseline models. Besides, our model can be used in the situation where no hand-annotated in-domain data is provided. |
format | Online Article Text |
id | pubmed-9168158 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-91681582022-06-07 A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER Hu, Yun He, Hao Chen, Zhengfei Zhu, Qingmeng Zheng, Changwen Comput Intell Neurosci Research Article Named entity recognition (NER) systems are often realized by supervised methods that require large hand-annotated data. When the hand-annotated data is limited, distantly supervised (DS) data and cross-domain (CD) data are usually used separately to improve the performance. The distantly supervised data can provide in-domain dictionary information, and the hand-annotated cross-domain information can be provided by cross-domain data. These two types of information are complemental. However, there are two problems required to be solved before using directly. First, the distantly supervised data may contain a lot of noise. Second, directly using cross-domain data may degrade performance due to the distribution mismatching problem. In this paper, we propose a unified model named PARE (PArtial learning and REinforcement learning). The PARE model can simultaneously use distantly supervised data and cross-domain data as external data. The model uses the partial learning method with a new label strategy to better handle the noise in distantly supervised data. The reinforcement learning method is used to alleviate the distribution mismatching problem in cross-domain data. Experiments in three datasets show that our model outperforms other baseline models. Besides, our model can be used in the situation where no hand-annotated in-domain data is provided. Hindawi 2022-05-29 /pmc/articles/PMC9168158/ /pubmed/35676955 http://dx.doi.org/10.1155/2022/1987829 Text en Copyright © 2022 Yun Hu et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Hu, Yun He, Hao Chen, Zhengfei Zhu, Qingmeng Zheng, Changwen A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER |
title | A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER |
title_full | A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER |
title_fullStr | A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER |
title_full_unstemmed | A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER |
title_short | A Unified Model Using Distantly Supervised Data and Cross-Domain Data in NER |
title_sort | unified model using distantly supervised data and cross-domain data in ner |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9168158/ https://www.ncbi.nlm.nih.gov/pubmed/35676955 http://dx.doi.org/10.1155/2022/1987829 |
work_keys_str_mv | AT huyun aunifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner AT hehao aunifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner AT chenzhengfei aunifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner AT zhuqingmeng aunifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner AT zhengchangwen aunifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner AT huyun unifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner AT hehao unifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner AT chenzhengfei unifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner AT zhuqingmeng unifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner AT zhengchangwen unifiedmodelusingdistantlysuperviseddataandcrossdomaindatainner |