Cargando…
Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus
The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spellin...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Netherlands
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4983282/ https://www.ncbi.nlm.nih.gov/pubmed/27570501 http://dx.doi.org/10.1007/s10579-015-9330-7 |
_version_ | 1782447879556694016 |
---|---|
author | Savkov, Aleksandar Carroll, John Koeling, Rob Cassell, Jackie |
author_facet | Savkov, Aleksandar Carroll, John Koeling, Rob Cassell, Jackie |
author_sort | Savkov, Aleksandar |
collection | PubMed |
description | The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning. |
format | Online Article Text |
id | pubmed-4983282 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Springer Netherlands |
record_format | MEDLINE/PubMed |
spelling | pubmed-49832822016-08-25 Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus Savkov, Aleksandar Carroll, John Koeling, Rob Cassell, Jackie Lang Resour Eval Original Paper The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning. Springer Netherlands 2016-01-11 2016 /pmc/articles/PMC4983282/ /pubmed/27570501 http://dx.doi.org/10.1007/s10579-015-9330-7 Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
spellingShingle | Original Paper Savkov, Aleksandar Carroll, John Koeling, Rob Cassell, Jackie Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus |
title | Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus |
title_full | Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus |
title_fullStr | Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus |
title_full_unstemmed | Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus |
title_short | Annotating patient clinical records with syntactic chunks and named entities: the Harvey Corpus |
title_sort | annotating patient clinical records with syntactic chunks and named entities: the harvey corpus |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4983282/ https://www.ncbi.nlm.nih.gov/pubmed/27570501 http://dx.doi.org/10.1007/s10579-015-9330-7 |
work_keys_str_mv | AT savkovaleksandar annotatingpatientclinicalrecordswithsyntacticchunksandnamedentitiestheharveycorpus AT carrolljohn annotatingpatientclinicalrecordswithsyntacticchunksandnamedentitiestheharveycorpus AT koelingrob annotatingpatientclinicalrecordswithsyntacticchunksandnamedentitiestheharveycorpus AT casselljackie annotatingpatientclinicalrecordswithsyntacticchunksandnamedentitiestheharveycorpus |