Cargando…
Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases
BACKGROUND: The healthcare sector is an interesting target for fraudsters. The availability of a great amount of data makes it possible to tackle this issue with the adoption of data mining techniques, making the auditing process more efficient and effective. This research has the objective of devel...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7362640/ https://www.ncbi.nlm.nih.gov/pubmed/32664923 http://dx.doi.org/10.1186/s12911-020-01143-9 |
_version_ | 1783559531126587392 |
---|---|
author | Massi, Michela Carlotta Ieva, Francesca Lettieri, Emanuele |
author_facet | Massi, Michela Carlotta Ieva, Francesca Lettieri, Emanuele |
author_sort | Massi, Michela Carlotta |
collection | PubMed |
description | BACKGROUND: The healthcare sector is an interesting target for fraudsters. The availability of a great amount of data makes it possible to tackle this issue with the adoption of data mining techniques, making the auditing process more efficient and effective. This research has the objective of developing a novel data mining model devoted to fraud detection among hospitals using Hospital Discharge Charts (HDC) in Administrative Databases. In particular, it is focused on the DRG upcoding practice, i.e., the tendency of registering codes for provided services and inpatients health status so to make the hospitalization fall within a more remunerative DRG class. METHODS: We propose a two-step algorithm: the first step entails kmeans clustering of providers to identify locally consistent and locally similar groups of hospitals, according to their characteristics and behavior treating a specific disease, in order to spot outliers within this groups of peers. An initial grid search for the best number of features to be selected (through Principal Feature Analysis) and the best number of local groups makes the algorithm extremely flexible. In the second step, we propose a human-decision support system that helps auditors cross-validating the identified outliers, analyzing them w.r.t. fraud-related variables, and the complexity of patients’ casemix they treated. The proposed algorithm was tested on a database relative to HDC collected by Regione Lombardia (Italy) in a time period of three years (2013-2015), focusing on the treatment of Heart Failure. RESULTS: The model identified 6 clusters of hospitals and 10 outliers among the 183 units. Out of those providers, we report the in depth the application of Step Two on three Hospitals (two private and one public). Cross-validating with the patients’ population and the hospitals’ characteristics, the public hospital seemed justified in its outlierness, while the two private providers were deemed interesting for a further investigation by auditors. CONCLUSIONS: The proposed model is promising in identifying anomalous DRG coding behavior and it is easily transferrable to all diseases and contexts of interest. Our proposal contributes to the limited literature regarding behavioral models for fraud detection, identifying the most ’cautious’ fraudsters. The results of the first and the second Steps together represent a valuable set of information for auditors in their preliminary investigation. |
format | Online Article Text |
id | pubmed-7362640 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-73626402020-07-20 Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases Massi, Michela Carlotta Ieva, Francesca Lettieri, Emanuele BMC Med Inform Decis Mak Technical Advance BACKGROUND: The healthcare sector is an interesting target for fraudsters. The availability of a great amount of data makes it possible to tackle this issue with the adoption of data mining techniques, making the auditing process more efficient and effective. This research has the objective of developing a novel data mining model devoted to fraud detection among hospitals using Hospital Discharge Charts (HDC) in Administrative Databases. In particular, it is focused on the DRG upcoding practice, i.e., the tendency of registering codes for provided services and inpatients health status so to make the hospitalization fall within a more remunerative DRG class. METHODS: We propose a two-step algorithm: the first step entails kmeans clustering of providers to identify locally consistent and locally similar groups of hospitals, according to their characteristics and behavior treating a specific disease, in order to spot outliers within this groups of peers. An initial grid search for the best number of features to be selected (through Principal Feature Analysis) and the best number of local groups makes the algorithm extremely flexible. In the second step, we propose a human-decision support system that helps auditors cross-validating the identified outliers, analyzing them w.r.t. fraud-related variables, and the complexity of patients’ casemix they treated. The proposed algorithm was tested on a database relative to HDC collected by Regione Lombardia (Italy) in a time period of three years (2013-2015), focusing on the treatment of Heart Failure. RESULTS: The model identified 6 clusters of hospitals and 10 outliers among the 183 units. Out of those providers, we report the in depth the application of Step Two on three Hospitals (two private and one public). Cross-validating with the patients’ population and the hospitals’ characteristics, the public hospital seemed justified in its outlierness, while the two private providers were deemed interesting for a further investigation by auditors. CONCLUSIONS: The proposed model is promising in identifying anomalous DRG coding behavior and it is easily transferrable to all diseases and contexts of interest. Our proposal contributes to the limited literature regarding behavioral models for fraud detection, identifying the most ’cautious’ fraudsters. The results of the first and the second Steps together represent a valuable set of information for auditors in their preliminary investigation. BioMed Central 2020-07-14 /pmc/articles/PMC7362640/ /pubmed/32664923 http://dx.doi.org/10.1186/s12911-020-01143-9 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Technical Advance Massi, Michela Carlotta Ieva, Francesca Lettieri, Emanuele Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases |
title | Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases |
title_full | Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases |
title_fullStr | Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases |
title_full_unstemmed | Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases |
title_short | Data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases |
title_sort | data mining application to healthcare fraud detection: a two-step unsupervised clustering method for outlier detection with administrative databases |
topic | Technical Advance |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7362640/ https://www.ncbi.nlm.nih.gov/pubmed/32664923 http://dx.doi.org/10.1186/s12911-020-01143-9 |
work_keys_str_mv | AT massimichelacarlotta dataminingapplicationtohealthcarefrauddetectionatwostepunsupervisedclusteringmethodforoutlierdetectionwithadministrativedatabases AT ievafrancesca dataminingapplicationtohealthcarefrauddetectionatwostepunsupervisedclusteringmethodforoutlierdetectionwithadministrativedatabases AT lettieriemanuele dataminingapplicationtohealthcarefrauddetectionatwostepunsupervisedclusteringmethodforoutlierdetectionwithadministrativedatabases |