Cargando…

Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text

BACKGROUND: The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Park, Albert, Hartzler, Andrea L, Huh, Jina, McDonald, David W, Pratt, Wanda
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications Inc. 2015
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642409/ https://www.ncbi.nlm.nih.gov/pubmed/26323337 http://dx.doi.org/10.2196/jmir.4612

_version_	1782400361483468800
author	Park, Albert Hartzler, Andrea L Huh, Jina McDonald, David W Pratt, Wanda
author_facet	Park, Albert Hartzler, Andrea L Huh, Jina McDonald, David W Pratt, Wanda
author_sort	Park, Albert
collection	PubMed
description	BACKGROUND: The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time. OBJECTIVE: The primary objective of this study is to explore an alternative approach—using low-cost, automated methods to detect failures (eg, incorrect boundaries, missed terms, mismapped concepts) when processing patient-generated text with existing biomedical NLP tools. We first characterize common failures that NLP tools can make in processing online community text. We then demonstrate the feasibility of our automated approach in detecting these common failures using one of the most popular biomedical NLP tools, MetaMap. METHODS: Using 9657 posts from an online cancer community, we explored our automated failure detection approach in two steps: (1) to characterize the failure types, we first manually reviewed MetaMap’s commonly occurring failures, grouped the inaccurate mappings into failure types, and then identified causes of the failures through iterative rounds of manual review using open coding, and (2) to automatically detect these failure types, we then explored combinations of existing NLP techniques and dictionary-based matching for each failure cause. Finally, we manually evaluated the automatically detected failures. RESULTS: From our manual review, we characterized three types of failure: (1) boundary failures, (2) missed term failures, and (3) word ambiguity failures. Within these three failure types, we discovered 12 causes of inaccurate mappings of concepts. We used automated methods to detect almost half of 383,572 MetaMap’s mappings as problematic. Word sense ambiguity failure was the most widely occurring, comprising 82.22% of failures. Boundary failure was the second most frequent, amounting to 15.90% of failures, while missed term failures were the least common, making up 1.88% of failures. The automated failure detection achieved precision, recall, accuracy, and F1 score of 83.00%, 92.57%, 88.17%, and 87.52%, respectively. CONCLUSIONS: We illustrate the challenges of processing patient-generated online health community text and characterize failures of NLP tools on this patient-generated health text, demonstrating the feasibility of our low-cost approach to automatically detect those failures. Our approach shows the potential for scalable and effective solutions to automatically assess the constantly evolving NLP tools and source vocabularies to process patient-generated text.
format	Online Article Text
id	pubmed-4642409
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	JMIR Publications Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-46424092016-01-12 Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text Park, Albert Hartzler, Andrea L Huh, Jina McDonald, David W Pratt, Wanda J Med Internet Res Original Paper BACKGROUND: The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time. OBJECTIVE: The primary objective of this study is to explore an alternative approach—using low-cost, automated methods to detect failures (eg, incorrect boundaries, missed terms, mismapped concepts) when processing patient-generated text with existing biomedical NLP tools. We first characterize common failures that NLP tools can make in processing online community text. We then demonstrate the feasibility of our automated approach in detecting these common failures using one of the most popular biomedical NLP tools, MetaMap. METHODS: Using 9657 posts from an online cancer community, we explored our automated failure detection approach in two steps: (1) to characterize the failure types, we first manually reviewed MetaMap’s commonly occurring failures, grouped the inaccurate mappings into failure types, and then identified causes of the failures through iterative rounds of manual review using open coding, and (2) to automatically detect these failure types, we then explored combinations of existing NLP techniques and dictionary-based matching for each failure cause. Finally, we manually evaluated the automatically detected failures. RESULTS: From our manual review, we characterized three types of failure: (1) boundary failures, (2) missed term failures, and (3) word ambiguity failures. Within these three failure types, we discovered 12 causes of inaccurate mappings of concepts. We used automated methods to detect almost half of 383,572 MetaMap’s mappings as problematic. Word sense ambiguity failure was the most widely occurring, comprising 82.22% of failures. Boundary failure was the second most frequent, amounting to 15.90% of failures, while missed term failures were the least common, making up 1.88% of failures. The automated failure detection achieved precision, recall, accuracy, and F1 score of 83.00%, 92.57%, 88.17%, and 87.52%, respectively. CONCLUSIONS: We illustrate the challenges of processing patient-generated online health community text and characterize failures of NLP tools on this patient-generated health text, demonstrating the feasibility of our low-cost approach to automatically detect those failures. Our approach shows the potential for scalable and effective solutions to automatically assess the constantly evolving NLP tools and source vocabularies to process patient-generated text. JMIR Publications Inc. 2015-08-31 /pmc/articles/PMC4642409/ /pubmed/26323337 http://dx.doi.org/10.2196/jmir.4612 Text en ©Albert Park, Andrea L Hartzler, Jina Huh, David W McDonald, Wanda Pratt. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 31.08.2015. https://creativecommons.org/licenses/by/2.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/ (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Park, Albert Hartzler, Andrea L Huh, Jina McDonald, David W Pratt, Wanda Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text
title	Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text
title_full	Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text
title_fullStr	Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text
title_full_unstemmed	Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text
title_short	Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text
title_sort	automatically detecting failures in natural language processing tools for online community text
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642409/ https://www.ncbi.nlm.nih.gov/pubmed/26323337 http://dx.doi.org/10.2196/jmir.4612
work_keys_str_mv	AT parkalbert automaticallydetectingfailuresinnaturallanguageprocessingtoolsforonlinecommunitytext AT hartzlerandreal automaticallydetectingfailuresinnaturallanguageprocessingtoolsforonlinecommunitytext AT huhjina automaticallydetectingfailuresinnaturallanguageprocessingtoolsforonlinecommunitytext AT mcdonalddavidw automaticallydetectingfailuresinnaturallanguageprocessingtoolsforonlinecommunitytext AT prattwanda automaticallydetectingfailuresinnaturallanguageprocessingtoolsforonlinecommunitytext

Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text

Ejemplares similares