Cargando…

Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records

In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural lang...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Yuan, Szolovits, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4954589/
https://www.ncbi.nlm.nih.gov/pubmed/27478379
http://dx.doi.org/10.4137/BII.S38916
_version_ 1782443797237465088
author Luo, Yuan
Szolovits, Peter
author_facet Luo, Yuan
Szolovits, Peter
author_sort Luo, Yuan
collection PubMed
description In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen’s interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen’s relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions.
format Online
Article
Text
id pubmed-4954589
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-49545892016-07-29 Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records Luo, Yuan Szolovits, Peter Biomed Inform Insights Perspective In natural language processing, stand-off annotation uses the starting and ending positions of an annotation to anchor it to the text and stores the annotation content separately from the text. We address the fundamental problem of efficiently storing stand-off annotations when applying natural language processing on narrative clinical notes in electronic medical records (EMRs) and efficiently retrieving such annotations that satisfy position constraints. Efficient storage and retrieval of stand-off annotations can facilitate tasks such as mapping unstructured text to electronic medical record ontologies. We first formulate this problem into the interval query problem, for which optimal query/update time is in general logarithm. We next perform a tight time complexity analysis on the basic interval tree query algorithm and show its nonoptimality when being applied to a collection of 13 query types from Allen’s interval algebra. We then study two closely related state-of-the-art interval query algorithms, proposed query reformulations, and augmentations to the second algorithm. Our proposed algorithm achieves logarithmic time stabbing-max query time complexity and solves the stabbing-interval query tasks on all of Allen’s relations in logarithmic time, attaining the theoretic lower bound. Updating time is kept logarithmic and the space requirement is kept linear at the same time. We also discuss interval management in external memory models and higher dimensions. Libertas Academica 2016-07-19 /pmc/articles/PMC4954589/ /pubmed/27478379 http://dx.doi.org/10.4137/BII.S38916 Text en © 2016 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 license.
spellingShingle Perspective
Luo, Yuan
Szolovits, Peter
Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records
title Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records
title_full Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records
title_fullStr Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records
title_full_unstemmed Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records
title_short Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records
title_sort efficient queries of stand-off annotations for natural language processing on electronic medical records
topic Perspective
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4954589/
https://www.ncbi.nlm.nih.gov/pubmed/27478379
http://dx.doi.org/10.4137/BII.S38916
work_keys_str_mv AT luoyuan efficientqueriesofstandoffannotationsfornaturallanguageprocessingonelectronicmedicalrecords
AT szolovitspeter efficientqueriesofstandoffannotationsfornaturallanguageprocessingonelectronicmedicalrecords