Cargando…

Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports

BACKGROUND: Pathology reports serve as an auditable trial of a patient’s clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to...

Descripción completa

Detalles Bibliográficos
Autores principales: Levy, Joshua, Vattikonda, Nishitha, Haudenschild, Christian, Christensen, Brock, Vaickus, Louis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8802304/
https://www.ncbi.nlm.nih.gov/pubmed/35127232
http://dx.doi.org/10.4103/jpi.jpi_52_21
_version_ 1784642654154260480
author Levy, Joshua
Vattikonda, Nishitha
Haudenschild, Christian
Christensen, Brock
Vaickus, Louis
author_facet Levy, Joshua
Vattikonda, Nishitha
Haudenschild, Christian
Christensen, Brock
Vaickus, Louis
author_sort Levy, Joshua
collection PubMed
description BACKGROUND: Pathology reports serve as an auditable trial of a patient’s clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to uncover textual patterns that inform clinical endpoints and biomarker information. Although deep learning methods have come to the forefront of NLP, there have been limited comparisons with the performance of other machine-learning methods in extracting key insights for the prediction of medical procedure information, which is used to inform reimbursement for pathology departments. In addition, the utility of combining and ranking information from multiple report subfields as compared with exclusively using the diagnostic field for the prediction of Current Procedural Terminology (CPT) codes and signing pathologists remains unclear. METHODS: After preprocessing pathology reports, we utilized advanced topic modeling to identify topics that characterize a cohort of 93,039 pathology reports at the Dartmouth-Hitchcock Department of Pathology and Laboratory Medicine (DPLM). We separately compared XGBoost, SVM, and BERT (Bidirectional Encoder Representation from Transformers) methodologies for the prediction of primary CPT codes (CPT 88302, 88304, 88305, 88307, 88309) as well as 38 ancillary CPT codes, using both the diagnostic text alone and text from all subfields. We performed similar analyses for characterizing text from a group of the 20 pathologists with the most pathology report sign-outs. Finally, we uncovered important report subcomponents by using model explanation techniques. RESULTS: We identified 20 topics that pertained to diagnostic and procedural information. Operating on diagnostic text alone, BERT outperformed XGBoost for the prediction of primary CPT codes. When utilizing all report subfields, XGBoost outperformed BERT for the prediction of primary CPT codes. Utilizing additional subfields of the pathology report increased prediction accuracy across ancillary CPT codes, and performance gains for using additional report subfields were high for the XGBoost model for primary CPT codes. Misclassifications of CPT codes were between codes of a similar complexity, and misclassifications between pathologists were subspecialty related. CONCLUSIONS: Our approach generated CPT code predictions with an accuracy that was higher than previously reported. Although diagnostic text is an important source of information, additional insights may be extracted from other report subfields. Although BERT approaches performed comparably to the XGBoost approaches, they may lend valuable information to pipelines that combine image, text, and -omics information. Future resource-saving opportunities exist to help hospitals detect mis-billing, standardize report text, and estimate productivity metrics that pertain to pathologist compensation (RVUs).
format Online
Article
Text
id pubmed-8802304
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-88023042022-02-03 Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports Levy, Joshua Vattikonda, Nishitha Haudenschild, Christian Christensen, Brock Vaickus, Louis J Pathol Inform Original Research Article BACKGROUND: Pathology reports serve as an auditable trial of a patient’s clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to uncover textual patterns that inform clinical endpoints and biomarker information. Although deep learning methods have come to the forefront of NLP, there have been limited comparisons with the performance of other machine-learning methods in extracting key insights for the prediction of medical procedure information, which is used to inform reimbursement for pathology departments. In addition, the utility of combining and ranking information from multiple report subfields as compared with exclusively using the diagnostic field for the prediction of Current Procedural Terminology (CPT) codes and signing pathologists remains unclear. METHODS: After preprocessing pathology reports, we utilized advanced topic modeling to identify topics that characterize a cohort of 93,039 pathology reports at the Dartmouth-Hitchcock Department of Pathology and Laboratory Medicine (DPLM). We separately compared XGBoost, SVM, and BERT (Bidirectional Encoder Representation from Transformers) methodologies for the prediction of primary CPT codes (CPT 88302, 88304, 88305, 88307, 88309) as well as 38 ancillary CPT codes, using both the diagnostic text alone and text from all subfields. We performed similar analyses for characterizing text from a group of the 20 pathologists with the most pathology report sign-outs. Finally, we uncovered important report subcomponents by using model explanation techniques. RESULTS: We identified 20 topics that pertained to diagnostic and procedural information. Operating on diagnostic text alone, BERT outperformed XGBoost for the prediction of primary CPT codes. When utilizing all report subfields, XGBoost outperformed BERT for the prediction of primary CPT codes. Utilizing additional subfields of the pathology report increased prediction accuracy across ancillary CPT codes, and performance gains for using additional report subfields were high for the XGBoost model for primary CPT codes. Misclassifications of CPT codes were between codes of a similar complexity, and misclassifications between pathologists were subspecialty related. CONCLUSIONS: Our approach generated CPT code predictions with an accuracy that was higher than previously reported. Although diagnostic text is an important source of information, additional insights may be extracted from other report subfields. Although BERT approaches performed comparably to the XGBoost approaches, they may lend valuable information to pipelines that combine image, text, and -omics information. Future resource-saving opportunities exist to help hospitals detect mis-billing, standardize report text, and estimate productivity metrics that pertain to pathologist compensation (RVUs). Elsevier 2022-12-20 /pmc/articles/PMC8802304/ /pubmed/35127232 http://dx.doi.org/10.4103/jpi.jpi_52_21 Text en © 2022 Published by Elsevier Inc. on behalf of Association for Pathology Informatics. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Original Research Article
Levy, Joshua
Vattikonda, Nishitha
Haudenschild, Christian
Christensen, Brock
Vaickus, Louis
Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports
title Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports
title_full Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports
title_fullStr Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports
title_full_unstemmed Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports
title_short Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports
title_sort comparison of machine-learning algorithms for the prediction of current procedural terminology (cpt) codes from pathology reports
topic Original Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8802304/
https://www.ncbi.nlm.nih.gov/pubmed/35127232
http://dx.doi.org/10.4103/jpi.jpi_52_21
work_keys_str_mv AT levyjoshua comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports
AT vattikondanishitha comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports
AT haudenschildchristian comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports
AT christensenbrock comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports
AT vaickuslouis comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports