Cargando…

Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing

A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been “solved.” This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting ne...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Stephen, Miller, Timothy, Masanz, James, Coarr, Matt, Halgrim, Scott, Carrell, David, Clark, Cheryl
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4231086/
https://www.ncbi.nlm.nih.gov/pubmed/25393544
http://dx.doi.org/10.1371/journal.pone.0112774
_version_ 1782344379721056256
author Wu, Stephen
Miller, Timothy
Masanz, James
Coarr, Matt
Halgrim, Scott
Carrell, David
Clark, Cheryl
author_facet Wu, Stephen
Miller, Timothy
Masanz, James
Coarr, Matt
Halgrim, Scott
Carrell, David
Clark, Cheryl
author_sort Wu, Stephen
collection PubMed
description A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been “solved.” This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP.
format Online
Article
Text
id pubmed-4231086
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-42310862014-11-18 Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing Wu, Stephen Miller, Timothy Masanz, James Coarr, Matt Halgrim, Scott Carrell, David Clark, Cheryl PLoS One Research Article A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been “solved.” This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP. Public Library of Science 2014-11-13 /pmc/articles/PMC4231086/ /pubmed/25393544 http://dx.doi.org/10.1371/journal.pone.0112774 Text en © 2014 Wu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Wu, Stephen
Miller, Timothy
Masanz, James
Coarr, Matt
Halgrim, Scott
Carrell, David
Clark, Cheryl
Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing
title Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing
title_full Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing
title_fullStr Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing
title_full_unstemmed Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing
title_short Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing
title_sort negation’s not solved: generalizability versus optimizability in clinical natural language processing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4231086/
https://www.ncbi.nlm.nih.gov/pubmed/25393544
http://dx.doi.org/10.1371/journal.pone.0112774
work_keys_str_mv AT wustephen negationsnotsolvedgeneralizabilityversusoptimizabilityinclinicalnaturallanguageprocessing
AT millertimothy negationsnotsolvedgeneralizabilityversusoptimizabilityinclinicalnaturallanguageprocessing
AT masanzjames negationsnotsolvedgeneralizabilityversusoptimizabilityinclinicalnaturallanguageprocessing
AT coarrmatt negationsnotsolvedgeneralizabilityversusoptimizabilityinclinicalnaturallanguageprocessing
AT halgrimscott negationsnotsolvedgeneralizabilityversusoptimizabilityinclinicalnaturallanguageprocessing
AT carrelldavid negationsnotsolvedgeneralizabilityversusoptimizabilityinclinicalnaturallanguageprocessing
AT clarkcheryl negationsnotsolvedgeneralizabilityversusoptimizabilityinclinicalnaturallanguageprocessing