Cargando…

Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports

BACKGROUND: Pathology reports serve as an auditable trial of a patient’s clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to...

Descripción completa

Detalles Bibliográficos
Autores principales:	Levy, Joshua, Vattikonda, Nishitha, Haudenschild, Christian, Christensen, Brock, Vaickus, Louis
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2022
Materias:	Original Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8802304/ https://www.ncbi.nlm.nih.gov/pubmed/35127232 http://dx.doi.org/10.4103/jpi.jpi_52_21

_version_	1784642654154260480
author	Levy, Joshua Vattikonda, Nishitha Haudenschild, Christian Christensen, Brock Vaickus, Louis
author_facet	Levy, Joshua Vattikonda, Nishitha Haudenschild, Christian Christensen, Brock Vaickus, Louis
author_sort	Levy, Joshua
collection	PubMed
description	BACKGROUND: Pathology reports serve as an auditable trial of a patient’s clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to uncover textual patterns that inform clinical endpoints and biomarker information. Although deep learning methods have come to the forefront of NLP, there have been limited comparisons with the performance of other machine-learning methods in extracting key insights for the prediction of medical procedure information, which is used to inform reimbursement for pathology departments. In addition, the utility of combining and ranking information from multiple report subfields as compared with exclusively using the diagnostic field for the prediction of Current Procedural Terminology (CPT) codes and signing pathologists remains unclear. METHODS: After preprocessing pathology reports, we utilized advanced topic modeling to identify topics that characterize a cohort of 93,039 pathology reports at the Dartmouth-Hitchcock Department of Pathology and Laboratory Medicine (DPLM). We separately compared XGBoost, SVM, and BERT (Bidirectional Encoder Representation from Transformers) methodologies for the prediction of primary CPT codes (CPT 88302, 88304, 88305, 88307, 88309) as well as 38 ancillary CPT codes, using both the diagnostic text alone and text from all subfields. We performed similar analyses for characterizing text from a group of the 20 pathologists with the most pathology report sign-outs. Finally, we uncovered important report subcomponents by using model explanation techniques. RESULTS: We identified 20 topics that pertained to diagnostic and procedural information. Operating on diagnostic text alone, BERT outperformed XGBoost for the prediction of primary CPT codes. When utilizing all report subfields, XGBoost outperformed BERT for the prediction of primary CPT codes. Utilizing additional subfields of the pathology report increased prediction accuracy across ancillary CPT codes, and performance gains for using additional report subfields were high for the XGBoost model for primary CPT codes. Misclassifications of CPT codes were between codes of a similar complexity, and misclassifications between pathologists were subspecialty related. CONCLUSIONS: Our approach generated CPT code predictions with an accuracy that was higher than previously reported. Although diagnostic text is an important source of information, additional insights may be extracted from other report subfields. Although BERT approaches performed comparably to the XGBoost approaches, they may lend valuable information to pipelines that combine image, text, and -omics information. Future resource-saving opportunities exist to help hospitals detect mis-billing, standardize report text, and estimate productivity metrics that pertain to pathologist compensation (RVUs).
format	Online Article Text
id	pubmed-8802304
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-88023042022-02-03 Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports Levy, Joshua Vattikonda, Nishitha Haudenschild, Christian Christensen, Brock Vaickus, Louis J Pathol Inform Original Research Article BACKGROUND: Pathology reports serve as an auditable trial of a patient’s clinical narrative, containing text pertaining to diagnosis, prognosis, and specimen processing. Recent works have utilized natural language processing (NLP) pipelines, which include rule-based or machine-learning analytics, to uncover textual patterns that inform clinical endpoints and biomarker information. Although deep learning methods have come to the forefront of NLP, there have been limited comparisons with the performance of other machine-learning methods in extracting key insights for the prediction of medical procedure information, which is used to inform reimbursement for pathology departments. In addition, the utility of combining and ranking information from multiple report subfields as compared with exclusively using the diagnostic field for the prediction of Current Procedural Terminology (CPT) codes and signing pathologists remains unclear. METHODS: After preprocessing pathology reports, we utilized advanced topic modeling to identify topics that characterize a cohort of 93,039 pathology reports at the Dartmouth-Hitchcock Department of Pathology and Laboratory Medicine (DPLM). We separately compared XGBoost, SVM, and BERT (Bidirectional Encoder Representation from Transformers) methodologies for the prediction of primary CPT codes (CPT 88302, 88304, 88305, 88307, 88309) as well as 38 ancillary CPT codes, using both the diagnostic text alone and text from all subfields. We performed similar analyses for characterizing text from a group of the 20 pathologists with the most pathology report sign-outs. Finally, we uncovered important report subcomponents by using model explanation techniques. RESULTS: We identified 20 topics that pertained to diagnostic and procedural information. Operating on diagnostic text alone, BERT outperformed XGBoost for the prediction of primary CPT codes. When utilizing all report subfields, XGBoost outperformed BERT for the prediction of primary CPT codes. Utilizing additional subfields of the pathology report increased prediction accuracy across ancillary CPT codes, and performance gains for using additional report subfields were high for the XGBoost model for primary CPT codes. Misclassifications of CPT codes were between codes of a similar complexity, and misclassifications between pathologists were subspecialty related. CONCLUSIONS: Our approach generated CPT code predictions with an accuracy that was higher than previously reported. Although diagnostic text is an important source of information, additional insights may be extracted from other report subfields. Although BERT approaches performed comparably to the XGBoost approaches, they may lend valuable information to pipelines that combine image, text, and -omics information. Future resource-saving opportunities exist to help hospitals detect mis-billing, standardize report text, and estimate productivity metrics that pertain to pathologist compensation (RVUs). Elsevier 2022-12-20 /pmc/articles/PMC8802304/ /pubmed/35127232 http://dx.doi.org/10.4103/jpi.jpi_52_21 Text en © 2022 Published by Elsevier Inc. on behalf of Association for Pathology Informatics. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Original Research Article Levy, Joshua Vattikonda, Nishitha Haudenschild, Christian Christensen, Brock Vaickus, Louis Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports
title	Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports
title_full	Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports
title_fullStr	Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports
title_full_unstemmed	Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports
title_short	Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports
title_sort	comparison of machine-learning algorithms for the prediction of current procedural terminology (cpt) codes from pathology reports
topic	Original Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8802304/ https://www.ncbi.nlm.nih.gov/pubmed/35127232 http://dx.doi.org/10.4103/jpi.jpi_52_21
work_keys_str_mv	AT levyjoshua comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports AT vattikondanishitha comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports AT haudenschildchristian comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports AT christensenbrock comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports AT vaickuslouis comparisonofmachinelearningalgorithmsforthepredictionofcurrentproceduralterminologycptcodesfrompathologyreports

Comparison of machine-learning algorithms for the prediction of Current Procedural Terminology (CPT) codes from pathology reports

Ejemplares similares