Cargando…

Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases

BACKGROUND: The healthcare sector is an interesting target for fraudsters. The availability of a great amount of data makes it possible to tackle this issue with the adoption of data mining techniques, making the auditing process more efficient and effective. This research has the objective of devel...

Descripción completa

Detalles Bibliográficos
Autores principales:	Massi, Michela Carlotta, Ieva, Francesca, Lettieri, Emanuele
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Technical Advance
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7362640/ https://www.ncbi.nlm.nih.gov/pubmed/32664923 http://dx.doi.org/10.1186/s12911-020-01143-9

_version_	1783559531126587392
author	Massi, Michela Carlotta Ieva, Francesca Lettieri, Emanuele
author_facet	Massi, Michela Carlotta Ieva, Francesca Lettieri, Emanuele
author_sort	Massi, Michela Carlotta
collection	PubMed
description	BACKGROUND: The healthcare sector is an interesting target for fraudsters. The availability of a great amount of data makes it possible to tackle this issue with the adoption of data mining techniques, making the auditing process more efficient and effective. This research has the objective of developing a novel data mining model devoted to fraud detection among hospitals using Hospital Discharge Charts (HDC) in Administrative Databases. In particular, it is focused on the DRG upcoding practice, i.e., the tendency of registering codes for provided services and inpatients health status so to make the hospitalization fall within a more remunerative DRG class. METHODS: We propose a two-step algorithm: the first step entails kmeans clustering of providers to identify locally consistent and locally similar groups of hospitals, according to their characteristics and behavior treating a specific disease, in order to spot outliers within this groups of peers. An initial grid search for the best number of features to be selected (through Principal Feature Analysis) and the best number of local groups makes the algorithm extremely flexible. In the second step, we propose a human-decision support system that helps auditors cross-validating the identified outliers, analyzing them w.r.t. fraud-related variables, and the complexity of patients’ casemix they treated. The proposed algorithm was tested on a database relative to HDC collected by Regione Lombardia (Italy) in a time period of three years (2013-2015), focusing on the treatment of Heart Failure. RESULTS: The model identified 6 clusters of hospitals and 10 outliers among the 183 units. Out of those providers, we report the in depth the application of Step Two on three Hospitals (two private and one public). Cross-validating with the patients’ population and the hospitals’ characteristics, the public hospital seemed justified in its outlierness, while the two private providers were deemed interesting for a further investigation by auditors. CONCLUSIONS: The proposed model is promising in identifying anomalous DRG coding behavior and it is easily transferrable to all diseases and contexts of interest. Our proposal contributes to the limited literature regarding behavioral models for fraud detection, identifying the most ’cautious’ fraudsters. The results of the first and the second Steps together represent a valuable set of information for auditors in their preliminary investigation.
format	Online Article Text
id	pubmed-7362640
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-73626402020-07-20 Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases Massi, Michela Carlotta Ieva, Francesca Lettieri, Emanuele BMC Med Inform Decis Mak Technical Advance BACKGROUND: The healthcare sector is an interesting target for fraudsters. The availability of a great amount of data makes it possible to tackle this issue with the adoption of data mining techniques, making the auditing process more efficient and effective. This research has the objective of developing a novel data mining model devoted to fraud detection among hospitals using Hospital Discharge Charts (HDC) in Administrative Databases. In particular, it is focused on the DRG upcoding practice, i.e., the tendency of registering codes for provided services and inpatients health status so to make the hospitalization fall within a more remunerative DRG class. METHODS: We propose a two-step algorithm: the first step entails kmeans clustering of providers to identify locally consistent and locally similar groups of hospitals, according to their characteristics and behavior treating a specific disease, in order to spot outliers within this groups of peers. An initial grid search for the best number of features to be selected (through Principal Feature Analysis) and the best number of local groups makes the algorithm extremely flexible. In the second step, we propose a human-decision support system that helps auditors cross-validating the identified outliers, analyzing them w.r.t. fraud-related variables, and the complexity of patients’ casemix they treated. The proposed algorithm was tested on a database relative to HDC collected by Regione Lombardia (Italy) in a time period of three years (2013-2015), focusing on the treatment of Heart Failure. RESULTS: The model identified 6 clusters of hospitals and 10 outliers among the 183 units. Out of those providers, we report the in depth the application of Step Two on three Hospitals (two private and one public). Cross-validating with the patients’ population and the hospitals’ characteristics, the public hospital seemed justified in its outlierness, while the two private providers were deemed interesting for a further investigation by auditors. CONCLUSIONS: The proposed model is promising in identifying anomalous DRG coding behavior and it is easily transferrable to all diseases and contexts of interest. Our proposal contributes to the limited literature regarding behavioral models for fraud detection, identifying the most ’cautious’ fraudsters. The results of the first and the second Steps together represent a valuable set of information for auditors in their preliminary investigation. BioMed Central 2020-07-14 /pmc/articles/PMC7362640/ /pubmed/32664923 http://dx.doi.org/10.1186/s12911-020-01143-9 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Technical Advance Massi, Michela Carlotta Ieva, Francesca Lettieri, Emanuele Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
title	Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
title_full	Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
title_fullStr	Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
title_full_unstemmed	Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
title_short	Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
title_sort	data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
topic	Technical Advance
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7362640/ https://www.ncbi.nlm.nih.gov/pubmed/32664923 http://dx.doi.org/10.1186/s12911-020-01143-9
work_keys_str_mv	AT massimichelacarlotta dataminingapplicationtohealthcarefrauddetectionatwostepunsupervisedclusteringmethodforoutlierdetectionwithadministrativedatabases AT ievafrancesca dataminingapplicationtohealthcarefrauddetectionatwostepunsupervisedclusteringmethodforoutlierdetectionwithadministrativedatabases AT lettieriemanuele dataminingapplicationtohealthcarefrauddetectionatwostepunsupervisedclusteringmethodforoutlierdetectionwithadministrativedatabases

Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases

Ejemplares similares