Cargando…
Validation of natural language processing to extract breast cancer pathology procedures and results
BACKGROUND: Pathology reports typically require manual review to abstract research data. We developed a natural language processing (NLP) system to automatically interpret free-text breast pathology reports with limited assistance from manual abstraction. METHODS: We used an iterative approach of ma...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Medknow Publications & Media Pvt Ltd
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4485196/ https://www.ncbi.nlm.nih.gov/pubmed/26167382 http://dx.doi.org/10.4103/2153-3539.159215 |
_version_ | 1782378747905703936 |
---|---|
author | Wieneke, Arika E. Bowles, Erin J. A. Cronkite, David Wernli, Karen J. Gao, Hongyuan Carrell, David Buist, Diana S. M. |
author_facet | Wieneke, Arika E. Bowles, Erin J. A. Cronkite, David Wernli, Karen J. Gao, Hongyuan Carrell, David Buist, Diana S. M. |
author_sort | Wieneke, Arika E. |
collection | PubMed |
description | BACKGROUND: Pathology reports typically require manual review to abstract research data. We developed a natural language processing (NLP) system to automatically interpret free-text breast pathology reports with limited assistance from manual abstraction. METHODS: We used an iterative approach of machine learning algorithms and constructed groups of related findings to identify breast-related procedures and results from free-text pathology reports. We evaluated the NLP system using an all-or-nothing approach to determine which reports could be processed entirely using NLP and which reports needed manual review beyond NLP. We divided 3234 reports for development (2910, 90%), and evaluation (324, 10%) purposes using manually reviewed pathology data as our gold standard. RESULTS: NLP correctly coded 12.7% of the evaluation set, flagged 49.1% of reports for manual review, incorrectly coded 30.8%, and correctly omitted 7.4% from the evaluation set due to irrelevancy (i.e. not breast-related). Common procedures and results were identified correctly (e.g. invasive ductal with 95.5% precision and 94.0% sensitivity), but entire reports were flagged for manual review because of rare findings and substantial variation in pathology report text. CONCLUSIONS: The NLP system we developed did not perform sufficiently for abstracting entire breast pathology reports. The all-or-nothing approach resulted in too broad of a scope of work and limited our flexibility to identify breast pathology procedures and results. Our NLP system was also limited by the lack of the gold standard data on rare findings and wide variation in pathology text. Focusing on individual, common elements and improving pathology text report standardization may improve performance. |
format | Online Article Text |
id | pubmed-4485196 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Medknow Publications & Media Pvt Ltd |
record_format | MEDLINE/PubMed |
spelling | pubmed-44851962015-07-12 Validation of natural language processing to extract breast cancer pathology procedures and results Wieneke, Arika E. Bowles, Erin J. A. Cronkite, David Wernli, Karen J. Gao, Hongyuan Carrell, David Buist, Diana S. M. J Pathol Inform Research Article BACKGROUND: Pathology reports typically require manual review to abstract research data. We developed a natural language processing (NLP) system to automatically interpret free-text breast pathology reports with limited assistance from manual abstraction. METHODS: We used an iterative approach of machine learning algorithms and constructed groups of related findings to identify breast-related procedures and results from free-text pathology reports. We evaluated the NLP system using an all-or-nothing approach to determine which reports could be processed entirely using NLP and which reports needed manual review beyond NLP. We divided 3234 reports for development (2910, 90%), and evaluation (324, 10%) purposes using manually reviewed pathology data as our gold standard. RESULTS: NLP correctly coded 12.7% of the evaluation set, flagged 49.1% of reports for manual review, incorrectly coded 30.8%, and correctly omitted 7.4% from the evaluation set due to irrelevancy (i.e. not breast-related). Common procedures and results were identified correctly (e.g. invasive ductal with 95.5% precision and 94.0% sensitivity), but entire reports were flagged for manual review because of rare findings and substantial variation in pathology report text. CONCLUSIONS: The NLP system we developed did not perform sufficiently for abstracting entire breast pathology reports. The all-or-nothing approach resulted in too broad of a scope of work and limited our flexibility to identify breast pathology procedures and results. Our NLP system was also limited by the lack of the gold standard data on rare findings and wide variation in pathology text. Focusing on individual, common elements and improving pathology text report standardization may improve performance. Medknow Publications & Media Pvt Ltd 2015-06-23 /pmc/articles/PMC4485196/ /pubmed/26167382 http://dx.doi.org/10.4103/2153-3539.159215 Text en Copyright: © 2015 Wieneke AE. http://creativecommons.org/licenses/by-nc-sa/3.0 This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Wieneke, Arika E. Bowles, Erin J. A. Cronkite, David Wernli, Karen J. Gao, Hongyuan Carrell, David Buist, Diana S. M. Validation of natural language processing to extract breast cancer pathology procedures and results |
title | Validation of natural language processing to extract breast cancer pathology procedures and results |
title_full | Validation of natural language processing to extract breast cancer pathology procedures and results |
title_fullStr | Validation of natural language processing to extract breast cancer pathology procedures and results |
title_full_unstemmed | Validation of natural language processing to extract breast cancer pathology procedures and results |
title_short | Validation of natural language processing to extract breast cancer pathology procedures and results |
title_sort | validation of natural language processing to extract breast cancer pathology procedures and results |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4485196/ https://www.ncbi.nlm.nih.gov/pubmed/26167382 http://dx.doi.org/10.4103/2153-3539.159215 |
work_keys_str_mv | AT wienekearikae validationofnaturallanguageprocessingtoextractbreastcancerpathologyproceduresandresults AT bowleserinja validationofnaturallanguageprocessingtoextractbreastcancerpathologyproceduresandresults AT cronkitedavid validationofnaturallanguageprocessingtoextractbreastcancerpathologyproceduresandresults AT wernlikarenj validationofnaturallanguageprocessingtoextractbreastcancerpathologyproceduresandresults AT gaohongyuan validationofnaturallanguageprocessingtoextractbreastcancerpathologyproceduresandresults AT carrelldavid validationofnaturallanguageprocessingtoextractbreastcancerpathologyproceduresandresults AT buistdianasm validationofnaturallanguageprocessingtoextractbreastcancerpathologyproceduresandresults |