Cargando…
Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports
BACKGROUND: Pathology reports serve as an auditable trial of a patient’s clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8802304/ https://www.ncbi.nlm.nih.gov/pubmed/35127232 http://dx.doi.org/10.4103/jpi.jpi_52_21 |
_version_ | 1784642654154260480 |
---|---|
author | Levy, Joshua Vattikonda, Nishitha Haudenschild, Christian Christensen, Brock Vaickus, Louis |
author_facet | Levy, Joshua Vattikonda, Nishitha Haudenschild, Christian Christensen, Brock Vaickus, Louis |
author_sort | Levy, Joshua |
collection | PubMed |
description | BACKGROUND: Pathology reports serve as an auditable trial of a patient’s clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to uncover textual patterns that inform clinical endpoints and biomarker information. Although deep learning methods have come to the forefront of NLP, there have been limited comparisons with the performance of other machine-learning methods in extracting key insights for the prediction of medical procedure information, which is used to inform reimbursement for pathology departments. In addition, the utility of combining and ranking information from multiple report subfields as compared with exclusively using the diagnostic field for the prediction of Current Procedural Terminology (CPT) codes and signing pathologists remains unclear. METHODS: After preprocessing pathology reports, we utilized advanced topic modeling to identify topics that characterize a cohort of 93,039 pathology reports at the Dartmouth-Hitchcock Department of Pathology and Laboratory Medicine (DPLM). We separately compared XGBoost, SVM, and BERT (Bidirectional Encoder Representation from Transformers) methodologies for the prediction of primary CPT codes (CPT 88302, 88304, 88305, 88307, 88309) as well as 38 ancillary CPT codes, using both the diagnostic text alone and text from all subfields. We performed similar analyses for characterizing text from a group of the 20 pathologists with the most pathology report sign-outs. Finally, we uncovered important report subcomponents by using model explanation techniques. RESULTS: We identified 20 topics that pertained to diagnostic and procedural information. Operating on diagnostic text alone, BERT outperformed XGBoost for the prediction of primary CPT codes. When utilizing all report subfields, XGBoost outperformed BERT for the prediction of primary CPT codes. Utilizing additional subfields of the pathology report increased prediction accuracy across ancillary CPT codes, and performance gains for using additional report subfields were high for the XGBoost model for primary CPT codes. Misclassifications of CPT codes were between codes of a similar complexity, and misclassifications between pathologists were subspecialty related. CONCLUSIONS: Our approach generated CPT code predictions with an accuracy that was higher than previously reported. Although diagnostic text is an important source of information, additional insights may be extracted from other report subfields. Although BERT approaches performed comparably to the XGBoost approaches, they may lend valuable information to pipelines that combine image, text, and -omics information. Future resource-saving opportunities exist to help hospitals detect mis-billing, standardize report text, and estimate productivity metrics that pertain to pathologist compensation (RVUs). |
format | Online Article Text |
id | pubmed-8802304 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-88023042022-02-03 Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports Levy, Joshua Vattikonda, Nishitha Haudenschild, Christian Christensen, Brock Vaickus, Louis J Pathol Inform Original Research Article BACKGROUND: Pathology reports serve as an auditable trial of a patient’s clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to uncover textual patterns that inform clinical endpoints and biomarker information. Although deep learning methods have come to the forefront of NLP, there have been limited comparisons with the performance of other machine-learning methods in extracting key insights for the prediction of medical procedure information, which is used to inform reimbursement for pathology departments. In addition, the utility of combining and ranking information from multiple report subfields as compared with exclusively using the diagnostic field for the prediction of Current Procedural Terminology (CPT) codes and signing pathologists remains unclear. METHODS: After preprocessing pathology reports, we utilized advanced topic modeling to identify topics that characterize a cohort of 93,039 pathology reports at the Dartmouth-Hitchcock Department of Pathology and Laboratory Medicine (DPLM). We separately compared XGBoost, SVM, and BERT (Bidirectional Encoder Representation from Transformers) methodologies for the prediction of primary CPT codes (CPT 88302, 88304, 88305, 88307, 88309) as well as 38 ancillary CPT codes, using both the diagnostic text alone and text from all subfields. We performed similar analyses for characterizing text from a group of the 20 pathologists with the most pathology report sign-outs. Finally, we uncovered important report subcomponents by using model explanation techniques. RESULTS: We identified 20 topics that pertained to diagnostic and procedural information. Operating on diagnostic text alone, BERT outperformed XGBoost for the prediction of primary CPT codes. When utilizing all report subfields, XGBoost outperformed BERT for the prediction of primary CPT codes. Utilizing additional subfields of the pathology report increased prediction accuracy across ancillary CPT codes, and performance gains for using additional report subfields were high for the XGBoost model for primary CPT codes. Misclassifications of CPT codes were between codes of a similar complexity, and misclassifications between pathologists were subspecialty related. CONCLUSIONS: Our approach generated CPT code predictions with an accuracy that was higher than previously reported. Although diagnostic text is an important source of information, additional insights may be extracted from other report subfields. Although BERT approaches performed comparably to the XGBoost approaches, they may lend valuable information to pipelines that combine image, text, and -omics information. Future resource-saving opportunities exist to help hospitals detect mis-billing, standardize report text, and estimate productivity metrics that pertain to pathologist compensation (RVUs). Elsevier 2022-12-20 /pmc/articles/PMC8802304/ /pubmed/35127232 http://dx.doi.org/10.4103/jpi.jpi_52_21 Text en © 2022 Published by Elsevier Inc. on behalf of Association for Pathology Informatics. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Original Research Article Levy, Joshua Vattikonda, Nishitha Haudenschild, Christian Christensen, Brock Vaickus, Louis Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports |
title | Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports |
title_full | Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports |
title_fullStr | Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports |
title_full_unstemmed | Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports |
title_short | Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports |
title_sort | comparison of machine-learning algorithms for the prediction of current procedural terminology (cpt) codes from pathology reports |
topic | Original Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8802304/ https://www.ncbi.nlm.nih.gov/pubmed/35127232 http://dx.doi.org/10.4103/jpi.jpi_52_21 |
work_keys_str_mv | AT levyjoshua comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports AT vattikondanishitha comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports AT haudenschildchristian comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports AT christensenbrock comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports AT vaickuslouis comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports |