Cargando…

Using Empirically Constructed Lexical Resources for Named Entity Recognition

Because of privacy concerns and the expense involved in creating an annotated corpus, the existing small-annotated corpora might not have sufficient examples for learning to statistically extract all the named-entities precisely. In this work, we evaluate what value may lie in automatically generate...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jonnalagadda, Siddhartha, Cohen, Trevor, Wu, Stephen, Liu, Hongfang, Gonzalez, Graciela
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Libertas Academica 2013
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3702195/ https://www.ncbi.nlm.nih.gov/pubmed/23847424 http://dx.doi.org/10.4137/BII.S11664

_version_	1782275764152958976
author	Jonnalagadda, Siddhartha Cohen, Trevor Wu, Stephen Liu, Hongfang Gonzalez, Graciela
author_facet	Jonnalagadda, Siddhartha Cohen, Trevor Wu, Stephen Liu, Hongfang Gonzalez, Graciela
author_sort	Jonnalagadda, Siddhartha
collection	PubMed
description	Because of privacy concerns and the expense involved in creating an annotated corpus, the existing small-annotated corpora might not have sufficient examples for learning to statistically extract all the named-entities precisely. In this work, we evaluate what value may lie in automatically generated features based on distributional semantics when using machine-learning named entity recognition (NER). The features we generated and experimented with include n-nearest words, support vector machine (SVM)-regions, and term clustering, all of which are considered distributional semantic features. The addition of the n-nearest words feature resulted in a greater increase in F-score than by using a manually constructed lexicon to a baseline system. Although the need for relatively small-annotated corpora for retraining is not obviated, lexicons empirically derived from unannotated text can not only supplement manually created lexicons, but also replace them. This phenomenon is observed in extracting concepts from both biomedical literature and clinical notes.
format	Online Article Text
id	pubmed-3702195
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Libertas Academica
record_format	MEDLINE/PubMed
spelling	pubmed-37021952013-07-11 Using Empirically Constructed Lexical Resources for Named Entity Recognition Jonnalagadda, Siddhartha Cohen, Trevor Wu, Stephen Liu, Hongfang Gonzalez, Graciela Biomed Inform Insights Original Research Because of privacy concerns and the expense involved in creating an annotated corpus, the existing small-annotated corpora might not have sufficient examples for learning to statistically extract all the named-entities precisely. In this work, we evaluate what value may lie in automatically generated features based on distributional semantics when using machine-learning named entity recognition (NER). The features we generated and experimented with include n-nearest words, support vector machine (SVM)-regions, and term clustering, all of which are considered distributional semantic features. The addition of the n-nearest words feature resulted in a greater increase in F-score than by using a manually constructed lexicon to a baseline system. Although the need for relatively small-annotated corpora for retraining is not obviated, lexicons empirically derived from unannotated text can not only supplement manually created lexicons, but also replace them. This phenomenon is observed in extracting concepts from both biomedical literature and clinical notes. Libertas Academica 2013-06-24 /pmc/articles/PMC3702195/ /pubmed/23847424 http://dx.doi.org/10.4137/BII.S11664 Text en © 2013 the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article published under the Creative Commons CC-BY-NC 3.0 license.
spellingShingle	Original Research Jonnalagadda, Siddhartha Cohen, Trevor Wu, Stephen Liu, Hongfang Gonzalez, Graciela Using Empirically Constructed Lexical Resources for Named Entity Recognition
title	Using Empirically Constructed Lexical Resources for Named Entity Recognition
title_full	Using Empirically Constructed Lexical Resources for Named Entity Recognition
title_fullStr	Using Empirically Constructed Lexical Resources for Named Entity Recognition
title_full_unstemmed	Using Empirically Constructed Lexical Resources for Named Entity Recognition
title_short	Using Empirically Constructed Lexical Resources for Named Entity Recognition
title_sort	using empirically constructed lexical resources for named entity recognition
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3702195/ https://www.ncbi.nlm.nih.gov/pubmed/23847424 http://dx.doi.org/10.4137/BII.S11664
work_keys_str_mv	AT jonnalagaddasiddhartha usingempiricallyconstructedlexicalresourcesfornamedentityrecognition AT cohentrevor usingempiricallyconstructedlexicalresourcesfornamedentityrecognition AT wustephen usingempiricallyconstructedlexicalresourcesfornamedentityrecognition AT liuhongfang usingempiricallyconstructedlexicalresourcesfornamedentityrecognition AT gonzalezgraciela usingempiricallyconstructedlexicalresourcesfornamedentityrecognition

Using Empirically Constructed Lexical Resources for Named Entity Recognition

Ejemplares similares