Cargando…

Validation of natural language processing to extract breast cancer pathology procedures and results

BACKGROUND: Pathology reports typically require manual review to abstract research data. We developed a natural language processing (NLP) system to automatically interpret free-text breast pathology reports with limited assistance from manual abstraction. METHODS: We used an iterative approach of ma...

Descripción completa

Detalles Bibliográficos
Autores principales: Wieneke, Arika E., Bowles, Erin J. A., Cronkite, David, Wernli, Karen J., Gao, Hongyuan, Carrell, David, Buist, Diana S. M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Medknow Publications & Media Pvt Ltd 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4485196/
https://www.ncbi.nlm.nih.gov/pubmed/26167382
http://dx.doi.org/10.4103/2153-3539.159215
_version_ 1782378747905703936
author Wieneke, Arika E.
Bowles, Erin J. A.
Cronkite, David
Wernli, Karen J.
Gao, Hongyuan
Carrell, David
Buist, Diana S. M.
author_facet Wieneke, Arika E.
Bowles, Erin J. A.
Cronkite, David
Wernli, Karen J.
Gao, Hongyuan
Carrell, David
Buist, Diana S. M.
author_sort Wieneke, Arika E.
collection PubMed
description BACKGROUND: Pathology reports typically require manual review to abstract research data. We developed a natural language processing (NLP) system to automatically interpret free-text breast pathology reports with limited assistance from manual abstraction. METHODS: We used an iterative approach of machine learning algorithms and constructed groups of related findings to identify breast-related procedures and results from free-text pathology reports. We evaluated the NLP system using an all-or-nothing approach to determine which reports could be processed entirely using NLP and which reports needed manual review beyond NLP. We divided 3234 reports for development (2910, 90%), and evaluation (324, 10%) purposes using manually reviewed pathology data as our gold standard. RESULTS: NLP correctly coded 12.7% of the evaluation set, flagged 49.1% of reports for manual review, incorrectly coded 30.8%, and correctly omitted 7.4% from the evaluation set due to irrelevancy (i.e. not breast-related). Common procedures and results were identified correctly (e.g. invasive ductal with 95.5% precision and 94.0% sensitivity), but entire reports were flagged for manual review because of rare findings and substantial variation in pathology report text. CONCLUSIONS: The NLP system we developed did not perform sufficiently for abstracting entire breast pathology reports. The all-or-nothing approach resulted in too broad of a scope of work and limited our flexibility to identify breast pathology procedures and results. Our NLP system was also limited by the lack of the gold standard data on rare findings and wide variation in pathology text. Focusing on individual, common elements and improving pathology text report standardization may improve performance.
format Online
Article
Text
id pubmed-4485196
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Medknow Publications & Media Pvt Ltd
record_format MEDLINE/PubMed
spelling pubmed-44851962015-07-12 Validation of natural language processing to extract breast cancer pathology procedures and results Wieneke, Arika E. Bowles, Erin J. A. Cronkite, David Wernli, Karen J. Gao, Hongyuan Carrell, David Buist, Diana S. M. J Pathol Inform Research Article BACKGROUND: Pathology reports typically require manual review to abstract research data. We developed a natural language processing (NLP) system to automatically interpret free-text breast pathology reports with limited assistance from manual abstraction. METHODS: We used an iterative approach of machine learning algorithms and constructed groups of related findings to identify breast-related procedures and results from free-text pathology reports. We evaluated the NLP system using an all-or-nothing approach to determine which reports could be processed entirely using NLP and which reports needed manual review beyond NLP. We divided 3234 reports for development (2910, 90%), and evaluation (324, 10%) purposes using manually reviewed pathology data as our gold standard. RESULTS: NLP correctly coded 12.7% of the evaluation set, flagged 49.1% of reports for manual review, incorrectly coded 30.8%, and correctly omitted 7.4% from the evaluation set due to irrelevancy (i.e. not breast-related). Common procedures and results were identified correctly (e.g. invasive ductal with 95.5% precision and 94.0% sensitivity), but entire reports were flagged for manual review because of rare findings and substantial variation in pathology report text. CONCLUSIONS: The NLP system we developed did not perform sufficiently for abstracting entire breast pathology reports. The all-or-nothing approach resulted in too broad of a scope of work and limited our flexibility to identify breast pathology procedures and results. Our NLP system was also limited by the lack of the gold standard data on rare findings and wide variation in pathology text. Focusing on individual, common elements and improving pathology text report standardization may improve performance. Medknow Publications & Media Pvt Ltd 2015-06-23 /pmc/articles/PMC4485196/ /pubmed/26167382 http://dx.doi.org/10.4103/2153-3539.159215 Text en Copyright: © 2015 Wieneke AE. http://creativecommons.org/licenses/by-nc-sa/3.0 This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Wieneke, Arika E.
Bowles, Erin J. A.
Cronkite, David
Wernli, Karen J.
Gao, Hongyuan
Carrell, David
Buist, Diana S. M.
Validation of natural language processing to extract breast cancer pathology procedures and results
title Validation of natural language processing to extract breast cancer pathology procedures and results
title_full Validation of natural language processing to extract breast cancer pathology procedures and results
title_fullStr Validation of natural language processing to extract breast cancer pathology procedures and results
title_full_unstemmed Validation of natural language processing to extract breast cancer pathology procedures and results
title_short Validation of natural language processing to extract breast cancer pathology procedures and results
title_sort validation of natural language processing to extract breast cancer pathology procedures and results
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4485196/
https://www.ncbi.nlm.nih.gov/pubmed/26167382
http://dx.doi.org/10.4103/2153-3539.159215
work_keys_str_mv AT wienekearikae validationofnaturallanguageprocessingtoextractbreastcancerpathologyproceduresandresults
AT bowleserinja validationofnaturallanguageprocessingtoextractbreastcancerpathologyproceduresandresults
AT cronkitedavid validationofnaturallanguageprocessingtoextractbreastcancerpathologyproceduresandresults
AT wernlikarenj validationofnaturallanguageprocessingtoextractbreastcancerpathologyproceduresandresults
AT gaohongyuan validationofnaturallanguageprocessingtoextractbreastcancerpathologyproceduresandresults
AT carrelldavid validationofnaturallanguageprocessingtoextractbreastcancerpathologyproceduresandresults
AT buistdianasm validationofnaturallanguageprocessingtoextractbreastcancerpathologyproceduresandresults