Cargando…
Using Empirically Constructed Lexical Resources for Named Entity Recognition
Because of privacy concerns and the expense involved in creating an annotated corpus, the existing small-annotated corpora might not have sufficient examples for learning to statistically extract all the named-entities precisely. In this work, we evaluate what value may lie in automatically generate...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3702195/ https://www.ncbi.nlm.nih.gov/pubmed/23847424 http://dx.doi.org/10.4137/BII.S11664 |
_version_ | 1782275764152958976 |
---|---|
author | Jonnalagadda, Siddhartha Cohen, Trevor Wu, Stephen Liu, Hongfang Gonzalez, Graciela |
author_facet | Jonnalagadda, Siddhartha Cohen, Trevor Wu, Stephen Liu, Hongfang Gonzalez, Graciela |
author_sort | Jonnalagadda, Siddhartha |
collection | PubMed |
description | Because of privacy concerns and the expense involved in creating an annotated corpus, the existing small-annotated corpora might not have sufficient examples for learning to statistically extract all the named-entities precisely. In this work, we evaluate what value may lie in automatically generated features based on distributional semantics when using machine-learning named entity recognition (NER). The features we generated and experimented with include n-nearest words, support vector machine (SVM)-regions, and term clustering, all of which are considered distributional semantic features. The addition of the n-nearest words feature resulted in a greater increase in F-score than by using a manually constructed lexicon to a baseline system. Although the need for relatively small-annotated corpora for retraining is not obviated, lexicons empirically derived from unannotated text can not only supplement manually created lexicons, but also replace them. This phenomenon is observed in extracting concepts from both biomedical literature and clinical notes. |
format | Online Article Text |
id | pubmed-3702195 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-37021952013-07-11 Using Empirically Constructed Lexical Resources for Named Entity Recognition Jonnalagadda, Siddhartha Cohen, Trevor Wu, Stephen Liu, Hongfang Gonzalez, Graciela Biomed Inform Insights Original Research Because of privacy concerns and the expense involved in creating an annotated corpus, the existing small-annotated corpora might not have sufficient examples for learning to statistically extract all the named-entities precisely. In this work, we evaluate what value may lie in automatically generated features based on distributional semantics when using machine-learning named entity recognition (NER). The features we generated and experimented with include n-nearest words, support vector machine (SVM)-regions, and term clustering, all of which are considered distributional semantic features. The addition of the n-nearest words feature resulted in a greater increase in F-score than by using a manually constructed lexicon to a baseline system. Although the need for relatively small-annotated corpora for retraining is not obviated, lexicons empirically derived from unannotated text can not only supplement manually created lexicons, but also replace them. This phenomenon is observed in extracting concepts from both biomedical literature and clinical notes. Libertas Academica 2013-06-24 /pmc/articles/PMC3702195/ /pubmed/23847424 http://dx.doi.org/10.4137/BII.S11664 Text en © 2013 the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article published under the Creative Commons CC-BY-NC 3.0 license. |
spellingShingle | Original Research Jonnalagadda, Siddhartha Cohen, Trevor Wu, Stephen Liu, Hongfang Gonzalez, Graciela Using Empirically Constructed Lexical Resources for Named Entity Recognition |
title | Using Empirically Constructed Lexical Resources for Named Entity Recognition |
title_full | Using Empirically Constructed Lexical Resources for Named Entity Recognition |
title_fullStr | Using Empirically Constructed Lexical Resources for Named Entity Recognition |
title_full_unstemmed | Using Empirically Constructed Lexical Resources for Named Entity Recognition |
title_short | Using Empirically Constructed Lexical Resources for Named Entity Recognition |
title_sort | using empirically constructed lexical resources for named entity recognition |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3702195/ https://www.ncbi.nlm.nih.gov/pubmed/23847424 http://dx.doi.org/10.4137/BII.S11664 |
work_keys_str_mv | AT jonnalagaddasiddhartha usingempiricallyconstructedlexicalresourcesfornamedentityrecognition AT cohentrevor usingempiricallyconstructedlexicalresourcesfornamedentityrecognition AT wustephen usingempiricallyconstructedlexicalresourcesfornamedentityrecognition AT liuhongfang usingempiricallyconstructedlexicalresourcesfornamedentityrecognition AT gonzalezgraciela usingempiricallyconstructedlexicalresourcesfornamedentityrecognition |