Cargando…

Extracting laboratory test information from biomedical text

BACKGROUND: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kang, Yanna Shen, Kayaalp, Mehmet
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Medknow Publications & Media Pvt Ltd 2013
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3779392/ https://www.ncbi.nlm.nih.gov/pubmed/24083058 http://dx.doi.org/10.4103/2153-3539.117450

_version_	1782285240115396608
author	Kang, Yanna Shen Kayaalp, Mehmet
author_facet	Kang, Yanna Shen Kayaalp, Mehmet
author_sort	Kang, Yanna Shen
collection	PubMed
description	BACKGROUND: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. METHODS: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. RESULTS: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. CONCLUSIONS: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure.
format	Online Article Text
id	pubmed-3779392
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Medknow Publications & Media Pvt Ltd
record_format	MEDLINE/PubMed
spelling	pubmed-37793922013-09-30 Extracting laboratory test information from biomedical text Kang, Yanna Shen Kayaalp, Mehmet J Pathol Inform Original Article BACKGROUND: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. METHODS: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. RESULTS: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. CONCLUSIONS: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. Medknow Publications & Media Pvt Ltd 2013-08-31 /pmc/articles/PMC3779392/ /pubmed/24083058 http://dx.doi.org/10.4103/2153-3539.117450 Text en Copyright: © 2013 Kang YS http://creativecommons.org/licenses/by-nc-sa/3.0 This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Original Article Kang, Yanna Shen Kayaalp, Mehmet Extracting laboratory test information from biomedical text
title	Extracting laboratory test information from biomedical text
title_full	Extracting laboratory test information from biomedical text
title_fullStr	Extracting laboratory test information from biomedical text
title_full_unstemmed	Extracting laboratory test information from biomedical text
title_short	Extracting laboratory test information from biomedical text
title_sort	extracting laboratory test information from biomedical text
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3779392/ https://www.ncbi.nlm.nih.gov/pubmed/24083058 http://dx.doi.org/10.4103/2153-3539.117450
work_keys_str_mv	AT kangyannashen extractinglaboratorytestinformationfrombiomedicaltext AT kayaalpmehmet extractinglaboratorytestinformationfrombiomedicaltext

Extracting laboratory test information from biomedical text

Ejemplares similares