Cargando…

Data-Centric AI for Healthcare Fraud Detection

Automated methods for detecting fraudulent healthcare providers have the potential to save billions of dollars in healthcare costs and improve the overall quality of patient care. This study presents a data-centric approach to improve healthcare fraud classification performance and reliability using...

Descripción completa

Detalles Bibliográficos
Autores principales:	Johnson, Justin M., Khoshgoftaar, Taghi M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer Nature Singapore 2023
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10173919/ https://www.ncbi.nlm.nih.gov/pubmed/37200563 http://dx.doi.org/10.1007/s42979-023-01809-x

_version_	1785039927475437568
author	Johnson, Justin M. Khoshgoftaar, Taghi M.
author_facet	Johnson, Justin M. Khoshgoftaar, Taghi M.
author_sort	Johnson, Justin M.
collection	PubMed
description	Automated methods for detecting fraudulent healthcare providers have the potential to save billions of dollars in healthcare costs and improve the overall quality of patient care. This study presents a data-centric approach to improve healthcare fraud classification performance and reliability using Medicare claims data. Publicly available data from the Centers for Medicare & Medicaid Services (CMS) are used to construct nine large-scale labeled data sets for supervised learning. First, we leverage CMS data to curate the 2013–2019 Part B, Part D, and Durable Medical Equipment, Prosthetics, Orthotics, and Supplies (DMEPOS) Medicare fraud classification data sets. We provide a review of each data set and data preparation techniques to create Medicare data sets for supervised learning and we propose an improved data labeling process. Next, we enrich the original Medicare fraud data sets with up to 58 new provider summary features. Finally, we address a common model evaluation pitfall and propose an adjusted cross-validation technique that mitigates target leakage to provide reliable evaluation results. Each data set is evaluated on the Medicare fraud classification task using extreme gradient boosting and random forest learners, multiple complementary performance metrics, and 95% confidence intervals. Results show that the new enriched data sets consistently outperform the original Medicare data sets that are currently used in related works. Our results encourage the data-centric machine learning workflow and provide a strong foundation for data understanding and preparation techniques for machine learning applications in healthcare fraud.
format	Online Article Text
id	pubmed-10173919
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Springer Nature Singapore
record_format	MEDLINE/PubMed
spelling	pubmed-101739192023-05-14 Data-Centric AI for Healthcare Fraud Detection Johnson, Justin M. Khoshgoftaar, Taghi M. SN Comput Sci Original Research Automated methods for detecting fraudulent healthcare providers have the potential to save billions of dollars in healthcare costs and improve the overall quality of patient care. This study presents a data-centric approach to improve healthcare fraud classification performance and reliability using Medicare claims data. Publicly available data from the Centers for Medicare & Medicaid Services (CMS) are used to construct nine large-scale labeled data sets for supervised learning. First, we leverage CMS data to curate the 2013–2019 Part B, Part D, and Durable Medical Equipment, Prosthetics, Orthotics, and Supplies (DMEPOS) Medicare fraud classification data sets. We provide a review of each data set and data preparation techniques to create Medicare data sets for supervised learning and we propose an improved data labeling process. Next, we enrich the original Medicare fraud data sets with up to 58 new provider summary features. Finally, we address a common model evaluation pitfall and propose an adjusted cross-validation technique that mitigates target leakage to provide reliable evaluation results. Each data set is evaluated on the Medicare fraud classification task using extreme gradient boosting and random forest learners, multiple complementary performance metrics, and 95% confidence intervals. Results show that the new enriched data sets consistently outperform the original Medicare data sets that are currently used in related works. Our results encourage the data-centric machine learning workflow and provide a strong foundation for data understanding and preparation techniques for machine learning applications in healthcare fraud. Springer Nature Singapore 2023-05-11 2023 /pmc/articles/PMC10173919/ /pubmed/37200563 http://dx.doi.org/10.1007/s42979-023-01809-x Text en © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Original Research Johnson, Justin M. Khoshgoftaar, Taghi M. Data-Centric AI for Healthcare Fraud Detection
title	Data-Centric AI for Healthcare Fraud Detection
title_full	Data-Centric AI for Healthcare Fraud Detection
title_fullStr	Data-Centric AI for Healthcare Fraud Detection
title_full_unstemmed	Data-Centric AI for Healthcare Fraud Detection
title_short	Data-Centric AI for Healthcare Fraud Detection
title_sort	data-centric ai for healthcare fraud detection
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10173919/ https://www.ncbi.nlm.nih.gov/pubmed/37200563 http://dx.doi.org/10.1007/s42979-023-01809-x
work_keys_str_mv	AT johnsonjustinm datacentricaiforhealthcarefrauddetection AT khoshgoftaartaghim datacentricaiforhealthcarefrauddetection

Data-Centric AI for Healthcare Fraud Detection

Ejemplares similares