Cargando…
Extracting laboratory test information from biomedical text
BACKGROUND: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Medknow Publications & Media Pvt Ltd
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3779392/ https://www.ncbi.nlm.nih.gov/pubmed/24083058 http://dx.doi.org/10.4103/2153-3539.117450 |
_version_ | 1782285240115396608 |
---|---|
author | Kang, Yanna Shen Kayaalp, Mehmet |
author_facet | Kang, Yanna Shen Kayaalp, Mehmet |
author_sort | Kang, Yanna Shen |
collection | PubMed |
description | BACKGROUND: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. METHODS: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. RESULTS: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. CONCLUSIONS: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. |
format | Online Article Text |
id | pubmed-3779392 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Medknow Publications & Media Pvt Ltd |
record_format | MEDLINE/PubMed |
spelling | pubmed-37793922013-09-30 Extracting laboratory test information from biomedical text Kang, Yanna Shen Kayaalp, Mehmet J Pathol Inform Original Article BACKGROUND: No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. METHODS: The authors developed a symbolic information extraction (SIE) system to extract device and test specific information about four types of laboratory test entities: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. RESULTS: Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. CONCLUSIONS: Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure. Medknow Publications & Media Pvt Ltd 2013-08-31 /pmc/articles/PMC3779392/ /pubmed/24083058 http://dx.doi.org/10.4103/2153-3539.117450 Text en Copyright: © 2013 Kang YS http://creativecommons.org/licenses/by-nc-sa/3.0 This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Original Article Kang, Yanna Shen Kayaalp, Mehmet Extracting laboratory test information from biomedical text |
title | Extracting laboratory test information from biomedical text |
title_full | Extracting laboratory test information from biomedical text |
title_fullStr | Extracting laboratory test information from biomedical text |
title_full_unstemmed | Extracting laboratory test information from biomedical text |
title_short | Extracting laboratory test information from biomedical text |
title_sort | extracting laboratory test information from biomedical text |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3779392/ https://www.ncbi.nlm.nih.gov/pubmed/24083058 http://dx.doi.org/10.4103/2153-3539.117450 |
work_keys_str_mv | AT kangyannashen extractinglaboratorytestinformationfrombiomedicaltext AT kayaalpmehmet extractinglaboratorytestinformationfrombiomedicaltext |