Cargando…

Using Empirically Constructed Lexical Resources for Named Entity Recognition

Because of privacy concerns and the expense involved in creating an annotated corpus, the existing small-annotated corpora might not have sufficient examples for learning to statistically extract all the named-entities precisely. In this work, we evaluate what value may lie in automatically generate...

Descripción completa

Detalles Bibliográficos
Autores principales: Jonnalagadda, Siddhartha, Cohen, Trevor, Wu, Stephen, Liu, Hongfang, Gonzalez, Graciela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3702195/
https://www.ncbi.nlm.nih.gov/pubmed/23847424
http://dx.doi.org/10.4137/BII.S11664
_version_ 1782275764152958976
author Jonnalagadda, Siddhartha
Cohen, Trevor
Wu, Stephen
Liu, Hongfang
Gonzalez, Graciela
author_facet Jonnalagadda, Siddhartha
Cohen, Trevor
Wu, Stephen
Liu, Hongfang
Gonzalez, Graciela
author_sort Jonnalagadda, Siddhartha
collection PubMed
description Because of privacy concerns and the expense involved in creating an annotated corpus, the existing small-annotated corpora might not have sufficient examples for learning to statistically extract all the named-entities precisely. In this work, we evaluate what value may lie in automatically generated features based on distributional semantics when using machine-learning named entity recognition (NER). The features we generated and experimented with include n-nearest words, support vector machine (SVM)-regions, and term clustering, all of which are considered distributional semantic features. The addition of the n-nearest words feature resulted in a greater increase in F-score than by using a manually constructed lexicon to a baseline system. Although the need for relatively small-annotated corpora for retraining is not obviated, lexicons empirically derived from unannotated text can not only supplement manually created lexicons, but also replace them. This phenomenon is observed in extracting concepts from both biomedical literature and clinical notes.
format Online
Article
Text
id pubmed-3702195
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-37021952013-07-11 Using Empirically Constructed Lexical Resources for Named Entity Recognition Jonnalagadda, Siddhartha Cohen, Trevor Wu, Stephen Liu, Hongfang Gonzalez, Graciela Biomed Inform Insights Original Research Because of privacy concerns and the expense involved in creating an annotated corpus, the existing small-annotated corpora might not have sufficient examples for learning to statistically extract all the named-entities precisely. In this work, we evaluate what value may lie in automatically generated features based on distributional semantics when using machine-learning named entity recognition (NER). The features we generated and experimented with include n-nearest words, support vector machine (SVM)-regions, and term clustering, all of which are considered distributional semantic features. The addition of the n-nearest words feature resulted in a greater increase in F-score than by using a manually constructed lexicon to a baseline system. Although the need for relatively small-annotated corpora for retraining is not obviated, lexicons empirically derived from unannotated text can not only supplement manually created lexicons, but also replace them. This phenomenon is observed in extracting concepts from both biomedical literature and clinical notes. Libertas Academica 2013-06-24 /pmc/articles/PMC3702195/ /pubmed/23847424 http://dx.doi.org/10.4137/BII.S11664 Text en © 2013 the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article published under the Creative Commons CC-BY-NC 3.0 license.
spellingShingle Original Research
Jonnalagadda, Siddhartha
Cohen, Trevor
Wu, Stephen
Liu, Hongfang
Gonzalez, Graciela
Using Empirically Constructed Lexical Resources for Named Entity Recognition
title Using Empirically Constructed Lexical Resources for Named Entity Recognition
title_full Using Empirically Constructed Lexical Resources for Named Entity Recognition
title_fullStr Using Empirically Constructed Lexical Resources for Named Entity Recognition
title_full_unstemmed Using Empirically Constructed Lexical Resources for Named Entity Recognition
title_short Using Empirically Constructed Lexical Resources for Named Entity Recognition
title_sort using empirically constructed lexical resources for named entity recognition
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3702195/
https://www.ncbi.nlm.nih.gov/pubmed/23847424
http://dx.doi.org/10.4137/BII.S11664
work_keys_str_mv AT jonnalagaddasiddhartha usingempiricallyconstructedlexicalresourcesfornamedentityrecognition
AT cohentrevor usingempiricallyconstructedlexicalresourcesfornamedentityrecognition
AT wustephen usingempiricallyconstructedlexicalresourcesfornamedentityrecognition
AT liuhongfang usingempiricallyconstructedlexicalresourcesfornamedentityrecognition
AT gonzalezgraciela usingempiricallyconstructedlexicalresourcesfornamedentityrecognition