Cargando…

Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus

The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spellin...

Descripción completa

Detalles Bibliográficos
Autores principales: Savkov, Aleksandar, Carroll, John, Koeling, Rob, Cassell, Jackie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Netherlands 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4983282/
https://www.ncbi.nlm.nih.gov/pubmed/27570501
http://dx.doi.org/10.1007/s10579-015-9330-7
_version_ 1782447879556694016
author Savkov, Aleksandar
Carroll, John
Koeling, Rob
Cassell, Jackie
author_facet Savkov, Aleksandar
Carroll, John
Koeling, Rob
Cassell, Jackie
author_sort Savkov, Aleksandar
collection PubMed
description The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning.
format Online
Article
Text
id pubmed-4983282
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Springer Netherlands
record_format MEDLINE/PubMed
spelling pubmed-49832822016-08-25 Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus Savkov, Aleksandar Carroll, John Koeling, Rob Cassell, Jackie Lang Resour Eval Original Paper The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning. Springer Netherlands 2016-01-11 2016 /pmc/articles/PMC4983282/ /pubmed/27570501 http://dx.doi.org/10.1007/s10579-015-9330-7 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle Original Paper
Savkov, Aleksandar
Carroll, John
Koeling, Rob
Cassell, Jackie
Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus
title Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus
title_full Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus
title_fullStr Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus
title_full_unstemmed Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus
title_short Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus
title_sort annotating patient clinical records with syntactic chunks and named entities: the harvey corpus
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4983282/
https://www.ncbi.nlm.nih.gov/pubmed/27570501
http://dx.doi.org/10.1007/s10579-015-9330-7
work_keys_str_mv AT savkovaleksandar annotatingpatientclinicalrecordswithsyntacticchunksandnamedentitiestheharveycorpus
AT carrolljohn annotatingpatientclinicalrecordswithsyntacticchunksandnamedentitiestheharveycorpus
AT koelingrob annotatingpatientclinicalrecordswithsyntacticchunksandnamedentitiestheharveycorpus
AT casselljackie annotatingpatientclinicalrecordswithsyntacticchunksandnamedentitiestheharveycorpus