Cargando…

DeIDNER Corpus: Annotation of Clinical Discharge Summary Notes for Named Entity Recognition Using BRAT Tool

Named Entity Recognition (NER) aims to identify and classify entities into predefined categories is a critical pre-processing task in Natural Language Processing (NLP) pipeline. Readily available off-the-shelf NER algorithms or programs are trained on a general corpus and often need to be retrained...

Descripción completa

Detalles Bibliográficos
Autores principales: SYED, Mahanazuddin, AL-SHUKRI, Shaymaa, SYED, Shorabuddin, SEXTON, Kevin, GREER, Melody L., ZOZUS, Meredith, BHATTACHARYYA, Sudeepa, PRIOR, Fred
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9019788/
https://www.ncbi.nlm.nih.gov/pubmed/34042780
http://dx.doi.org/10.3233/SHTI210195
_version_ 1784689375255199744
author SYED, Mahanazuddin
AL-SHUKRI, Shaymaa
SYED, Shorabuddin
SEXTON, Kevin
GREER, Melody L.
ZOZUS, Meredith
BHATTACHARYYA, Sudeepa
PRIOR, Fred
author_facet SYED, Mahanazuddin
AL-SHUKRI, Shaymaa
SYED, Shorabuddin
SEXTON, Kevin
GREER, Melody L.
ZOZUS, Meredith
BHATTACHARYYA, Sudeepa
PRIOR, Fred
author_sort SYED, Mahanazuddin
collection PubMed
description Named Entity Recognition (NER) aims to identify and classify entities into predefined categories is a critical pre-processing task in Natural Language Processing (NLP) pipeline. Readily available off-the-shelf NER algorithms or programs are trained on a general corpus and often need to be retrained when applied on a different domain. The end model’s performance depends on the quality of named entities generated by these NER models used in the NLP task. To improve NER model accuracy, researchers build domain-specific corpora for both model training and evaluation. However, in the clinical domain, there is a dearth of training data because of privacy reasons, forcing many studies to use NER models that are trained in the non-clinical domain to generate NER feature-set. Thus, influencing the performance of the downstream NLP tasks like information extraction and de-identification. In this paper, our objective is to create a high quality annotated clinical corpus for training NER models that can be easily generalizable and can be used in a downstream de-identification task to generate named entities feature-set.
format Online
Article
Text
id pubmed-9019788
institution National Center for Biotechnology Information
language English
publishDate 2021
record_format MEDLINE/PubMed
spelling pubmed-90197882022-04-20 DeIDNER Corpus: Annotation of Clinical Discharge Summary Notes for Named Entity Recognition Using BRAT Tool SYED, Mahanazuddin AL-SHUKRI, Shaymaa SYED, Shorabuddin SEXTON, Kevin GREER, Melody L. ZOZUS, Meredith BHATTACHARYYA, Sudeepa PRIOR, Fred Stud Health Technol Inform Article Named Entity Recognition (NER) aims to identify and classify entities into predefined categories is a critical pre-processing task in Natural Language Processing (NLP) pipeline. Readily available off-the-shelf NER algorithms or programs are trained on a general corpus and often need to be retrained when applied on a different domain. The end model’s performance depends on the quality of named entities generated by these NER models used in the NLP task. To improve NER model accuracy, researchers build domain-specific corpora for both model training and evaluation. However, in the clinical domain, there is a dearth of training data because of privacy reasons, forcing many studies to use NER models that are trained in the non-clinical domain to generate NER feature-set. Thus, influencing the performance of the downstream NLP tasks like information extraction and de-identification. In this paper, our objective is to create a high quality annotated clinical corpus for training NER models that can be easily generalizable and can be used in a downstream de-identification task to generate named entities feature-set. 2021-05-27 /pmc/articles/PMC9019788/ /pubmed/34042780 http://dx.doi.org/10.3233/SHTI210195 Text en https://creativecommons.org/licenses/by-nc/4.0/This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
spellingShingle Article
SYED, Mahanazuddin
AL-SHUKRI, Shaymaa
SYED, Shorabuddin
SEXTON, Kevin
GREER, Melody L.
ZOZUS, Meredith
BHATTACHARYYA, Sudeepa
PRIOR, Fred
DeIDNER Corpus: Annotation of Clinical Discharge Summary Notes for Named Entity Recognition Using BRAT Tool
title DeIDNER Corpus: Annotation of Clinical Discharge Summary Notes for Named Entity Recognition Using BRAT Tool
title_full DeIDNER Corpus: Annotation of Clinical Discharge Summary Notes for Named Entity Recognition Using BRAT Tool
title_fullStr DeIDNER Corpus: Annotation of Clinical Discharge Summary Notes for Named Entity Recognition Using BRAT Tool
title_full_unstemmed DeIDNER Corpus: Annotation of Clinical Discharge Summary Notes for Named Entity Recognition Using BRAT Tool
title_short DeIDNER Corpus: Annotation of Clinical Discharge Summary Notes for Named Entity Recognition Using BRAT Tool
title_sort deidner corpus: annotation of clinical discharge summary notes for named entity recognition using brat tool
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9019788/
https://www.ncbi.nlm.nih.gov/pubmed/34042780
http://dx.doi.org/10.3233/SHTI210195
work_keys_str_mv AT syedmahanazuddin deidnercorpusannotationofclinicaldischargesummarynotesfornamedentityrecognitionusingbrattool
AT alshukrishaymaa deidnercorpusannotationofclinicaldischargesummarynotesfornamedentityrecognitionusingbrattool
AT syedshorabuddin deidnercorpusannotationofclinicaldischargesummarynotesfornamedentityrecognitionusingbrattool
AT sextonkevin deidnercorpusannotationofclinicaldischargesummarynotesfornamedentityrecognitionusingbrattool
AT greermelodyl deidnercorpusannotationofclinicaldischargesummarynotesfornamedentityrecognitionusingbrattool
AT zozusmeredith deidnercorpusannotationofclinicaldischargesummarynotesfornamedentityrecognitionusingbrattool
AT bhattacharyyasudeepa deidnercorpusannotationofclinicaldischargesummarynotesfornamedentityrecognitionusingbrattool
AT priorfred deidnercorpusannotationofclinicaldischargesummarynotesfornamedentityrecognitionusingbrattool