Cargando…

Accelerating the annotation of sparse named entities by dynamic sentence selection

BACKGROUND: Previous studies of named entity recognition have shown that a reasonable level of recognition accuracy can be achieved by using machine learning models such as conditional random fields or support vector machines. However, the lack of training data (i.e. annotated corpora) makes it diff...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tsuruoka, Yoshimasa, Tsujii, Jun'ichi, Ananiadou, Sophia
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2586757/ https://www.ncbi.nlm.nih.gov/pubmed/19025694 http://dx.doi.org/10.1186/1471-2105-9-S11-S8

_version_	1782160909575127040
author	Tsuruoka, Yoshimasa Tsujii, Jun'ichi Ananiadou, Sophia
author_facet	Tsuruoka, Yoshimasa Tsujii, Jun'ichi Ananiadou, Sophia
author_sort	Tsuruoka, Yoshimasa
collection	PubMed
description	BACKGROUND: Previous studies of named entity recognition have shown that a reasonable level of recognition accuracy can be achieved by using machine learning models such as conditional random fields or support vector machines. However, the lack of training data (i.e. annotated corpora) makes it difficult for machine learning-based named entity recognizers to be used in building practical information extraction systems. RESULTS: This paper presents an active learning-like framework for reducing the human effort required to create named entity annotations in a corpus. In this framework, the annotation work is performed as an iterative and interactive process between the human annotator and a probabilistic named entity tagger. Unlike active learning, our framework aims to annotate all occurrences of the target named entities in the given corpus, so that the resulting annotations are free from the sampling bias which is inevitable in active learning approaches. CONCLUSION: We evaluate our framework by simulating the annotation process using two named entity corpora and show that our approach can reduce the number of sentences which need to be examined by the human annotator. The cost reduction achieved by the framework could be drastic when the target named entities are sparse.
format	Text
id	pubmed-2586757
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-25867572008-11-26 Accelerating the annotation of sparse named entities by dynamic sentence selection Tsuruoka, Yoshimasa Tsujii, Jun'ichi Ananiadou, Sophia BMC Bioinformatics Research BACKGROUND: Previous studies of named entity recognition have shown that a reasonable level of recognition accuracy can be achieved by using machine learning models such as conditional random fields or support vector machines. However, the lack of training data (i.e. annotated corpora) makes it difficult for machine learning-based named entity recognizers to be used in building practical information extraction systems. RESULTS: This paper presents an active learning-like framework for reducing the human effort required to create named entity annotations in a corpus. In this framework, the annotation work is performed as an iterative and interactive process between the human annotator and a probabilistic named entity tagger. Unlike active learning, our framework aims to annotate all occurrences of the target named entities in the given corpus, so that the resulting annotations are free from the sampling bias which is inevitable in active learning approaches. CONCLUSION: We evaluate our framework by simulating the annotation process using two named entity corpora and show that our approach can reduce the number of sentences which need to be examined by the human annotator. The cost reduction achieved by the framework could be drastic when the target named entities are sparse. BioMed Central 2008-11-19 /pmc/articles/PMC2586757/ /pubmed/19025694 http://dx.doi.org/10.1186/1471-2105-9-S11-S8 Text en Copyright © 2008 Tsuruoka et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Tsuruoka, Yoshimasa Tsujii, Jun'ichi Ananiadou, Sophia Accelerating the annotation of sparse named entities by dynamic sentence selection
title	Accelerating the annotation of sparse named entities by dynamic sentence selection
title_full	Accelerating the annotation of sparse named entities by dynamic sentence selection
title_fullStr	Accelerating the annotation of sparse named entities by dynamic sentence selection
title_full_unstemmed	Accelerating the annotation of sparse named entities by dynamic sentence selection
title_short	Accelerating the annotation of sparse named entities by dynamic sentence selection
title_sort	accelerating the annotation of sparse named entities by dynamic sentence selection
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2586757/ https://www.ncbi.nlm.nih.gov/pubmed/19025694 http://dx.doi.org/10.1186/1471-2105-9-S11-S8
work_keys_str_mv	AT tsuruokayoshimasa acceleratingtheannotationofsparsenamedentitiesbydynamicsentenceselection AT tsujiijunichi acceleratingtheannotationofsparsenamedentitiesbydynamicsentenceselection AT ananiadousophia acceleratingtheannotationofsparsenamedentitiesbydynamicsentenceselection

Accelerating the annotation of sparse named entities by dynamic sentence selection

Ejemplares similares