Cargando…

Extracting laboratory test information from biomedical text

BACKGROUND: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with...

Descripción completa

Detalles Bibliográficos
Autores principales: Kang, Yanna Shen, Kayaalp, Mehmet
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Medknow Publications & Media Pvt Ltd 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3779392/
https://www.ncbi.nlm.nih.gov/pubmed/24083058
http://dx.doi.org/10.4103/2153-3539.117450
_version_ 1782285240115396608
author Kang, Yanna Shen
Kayaalp, Mehmet
author_facet Kang, Yanna Shen
Kayaalp, Mehmet
author_sort Kang, Yanna Shen
collection PubMed
description BACKGROUND: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. METHODS: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. RESULTS: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. CONCLUSIONS: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure.
format Online
Article
Text
id pubmed-3779392
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Medknow Publications & Media Pvt Ltd
record_format MEDLINE/PubMed
spelling pubmed-37793922013-09-30 Extracting laboratory test information from biomedical text Kang, Yanna Shen Kayaalp, Mehmet J Pathol Inform Original Article BACKGROUND: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. METHODS: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. RESULTS: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. CONCLUSIONS: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. Medknow Publications & Media Pvt Ltd 2013-08-31 /pmc/articles/PMC3779392/ /pubmed/24083058 http://dx.doi.org/10.4103/2153-3539.117450 Text en Copyright: © 2013 Kang YS http://creativecommons.org/licenses/by-nc-sa/3.0 This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Original Article
Kang, Yanna Shen
Kayaalp, Mehmet
Extracting laboratory test information from biomedical text
title Extracting laboratory test information from biomedical text
title_full Extracting laboratory test information from biomedical text
title_fullStr Extracting laboratory test information from biomedical text
title_full_unstemmed Extracting laboratory test information from biomedical text
title_short Extracting laboratory test information from biomedical text
title_sort extracting laboratory test information from biomedical text
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3779392/
https://www.ncbi.nlm.nih.gov/pubmed/24083058
http://dx.doi.org/10.4103/2153-3539.117450
work_keys_str_mv AT kangyannashen extractinglaboratorytestinformationfrombiomedicaltext
AT kayaalpmehmet extractinglaboratorytestinformationfrombiomedicaltext