Cargando…

Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases

BACKGROUND: The healthcare sector is an interesting target for fraudsters. The availability of a great amount of data makes it possible to tackle this issue with the adoption of data mining techniques, making the auditing process more efficient and effective. This research has the objective of devel...

Descripción completa

Detalles Bibliográficos
Autores principales: Massi, Michela Carlotta, Ieva, Francesca, Lettieri, Emanuele
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7362640/
https://www.ncbi.nlm.nih.gov/pubmed/32664923
http://dx.doi.org/10.1186/s12911-020-01143-9
_version_ 1783559531126587392
author Massi, Michela Carlotta
Ieva, Francesca
Lettieri, Emanuele
author_facet Massi, Michela Carlotta
Ieva, Francesca
Lettieri, Emanuele
author_sort Massi, Michela Carlotta
collection PubMed
description BACKGROUND: The healthcare sector is an interesting target for fraudsters. The availability of a great amount of data makes it possible to tackle this issue with the adoption of data mining techniques, making the auditing process more efficient and effective. This research has the objective of developing a novel data mining model devoted to fraud detection among hospitals using Hospital Discharge Charts (HDC) in Administrative Databases. In particular, it is focused on the DRG upcoding practice, i.e., the tendency of registering codes for provided services and inpatients health status so to make the hospitalization fall within a more remunerative DRG class. METHODS: We propose a two-step algorithm: the first step entails kmeans clustering of providers to identify locally consistent and locally similar groups of hospitals, according to their characteristics and behavior treating a specific disease, in order to spot outliers within this groups of peers. An initial grid search for the best number of features to be selected (through Principal Feature Analysis) and the best number of local groups makes the algorithm extremely flexible. In the second step, we propose a human-decision support system that helps auditors cross-validating the identified outliers, analyzing them w.r.t. fraud-related variables, and the complexity of patients’ casemix they treated. The proposed algorithm was tested on a database relative to HDC collected by Regione Lombardia (Italy) in a time period of three years (2013-2015), focusing on the treatment of Heart Failure. RESULTS: The model identified 6 clusters of hospitals and 10 outliers among the 183 units. Out of those providers, we report the in depth the application of Step Two on three Hospitals (two private and one public). Cross-validating with the patients’ population and the hospitals’ characteristics, the public hospital seemed justified in its outlierness, while the two private providers were deemed interesting for a further investigation by auditors. CONCLUSIONS: The proposed model is promising in identifying anomalous DRG coding behavior and it is easily transferrable to all diseases and contexts of interest. Our proposal contributes to the limited literature regarding behavioral models for fraud detection, identifying the most ’cautious’ fraudsters. The results of the first and the second Steps together represent a valuable set of information for auditors in their preliminary investigation.
format Online
Article
Text
id pubmed-7362640
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-73626402020-07-20 Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases Massi, Michela Carlotta Ieva, Francesca Lettieri, Emanuele BMC Med Inform Decis Mak Technical Advance BACKGROUND: The healthcare sector is an interesting target for fraudsters. The availability of a great amount of data makes it possible to tackle this issue with the adoption of data mining techniques, making the auditing process more efficient and effective. This research has the objective of developing a novel data mining model devoted to fraud detection among hospitals using Hospital Discharge Charts (HDC) in Administrative Databases. In particular, it is focused on the DRG upcoding practice, i.e., the tendency of registering codes for provided services and inpatients health status so to make the hospitalization fall within a more remunerative DRG class. METHODS: We propose a two-step algorithm: the first step entails kmeans clustering of providers to identify locally consistent and locally similar groups of hospitals, according to their characteristics and behavior treating a specific disease, in order to spot outliers within this groups of peers. An initial grid search for the best number of features to be selected (through Principal Feature Analysis) and the best number of local groups makes the algorithm extremely flexible. In the second step, we propose a human-decision support system that helps auditors cross-validating the identified outliers, analyzing them w.r.t. fraud-related variables, and the complexity of patients’ casemix they treated. The proposed algorithm was tested on a database relative to HDC collected by Regione Lombardia (Italy) in a time period of three years (2013-2015), focusing on the treatment of Heart Failure. RESULTS: The model identified 6 clusters of hospitals and 10 outliers among the 183 units. Out of those providers, we report the in depth the application of Step Two on three Hospitals (two private and one public). Cross-validating with the patients’ population and the hospitals’ characteristics, the public hospital seemed justified in its outlierness, while the two private providers were deemed interesting for a further investigation by auditors. CONCLUSIONS: The proposed model is promising in identifying anomalous DRG coding behavior and it is easily transferrable to all diseases and contexts of interest. Our proposal contributes to the limited literature regarding behavioral models for fraud detection, identifying the most ’cautious’ fraudsters. The results of the first and the second Steps together represent a valuable set of information for auditors in their preliminary investigation. BioMed Central 2020-07-14 /pmc/articles/PMC7362640/ /pubmed/32664923 http://dx.doi.org/10.1186/s12911-020-01143-9 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Technical Advance
Massi, Michela Carlotta
Ieva, Francesca
Lettieri, Emanuele
Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
title Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
title_full Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
title_fullStr Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
title_full_unstemmed Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
title_short Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
title_sort data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
topic Technical Advance
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7362640/
https://www.ncbi.nlm.nih.gov/pubmed/32664923
http://dx.doi.org/10.1186/s12911-020-01143-9
work_keys_str_mv AT massimichelacarlotta dataminingapplicationtohealthcarefrauddetectionatwostepunsupervisedclusteringmethodforoutlierdetectionwithadministrativedatabases
AT ievafrancesca dataminingapplicationtohealthcarefrauddetectionatwostepunsupervisedclusteringmethodforoutlierdetectionwithadministrativedatabases
AT lettieriemanuele dataminingapplicationtohealthcarefrauddetectionatwostepunsupervisedclusteringmethodforoutlierdetectionwithadministrativedatabases