Cargando…

A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest

Background: the credit scoring model is an effective tool for banks and other financial institutions to distinguish potential default borrowers. The credit scoring model represented by machine learning methods such as deep learning performs well in terms of the accuracy of default discrimination, bu...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Gang, Ma, Hong-Dong, Liu, Rong-Yue, Shen, Meng-Di, Zhang, Ke-Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150340/
https://www.ncbi.nlm.nih.gov/pubmed/34066807
http://dx.doi.org/10.3390/e23050582
_version_ 1783698128640147456
author Li, Gang
Ma, Hong-Dong
Liu, Rong-Yue
Shen, Meng-Di
Zhang, Ke-Xin
author_facet Li, Gang
Ma, Hong-Dong
Liu, Rong-Yue
Shen, Meng-Di
Zhang, Ke-Xin
author_sort Li, Gang
collection PubMed
description Background: the credit scoring model is an effective tool for banks and other financial institutions to distinguish potential default borrowers. The credit scoring model represented by machine learning methods such as deep learning performs well in terms of the accuracy of default discrimination, but the model itself also has many shortcomings such as many hyperparameters and large dependence on big data. There is still a lot of room to improve its interpretability and robustness. Methods: the deep forest or multi-Grained Cascade Forest (gcForest) is a decision tree depth model based on the random forest algorithm. Using multidimensional scanning and cascading processing, gcForest can effectively identify and process high-dimensional feature information. At the same time, gcForest has fewer hyperparameters and has strong robustness. So, this paper constructs a two-stage hybrid default discrimination model based on multiple feature selection methods and gcForest algorithm, and at the same time, it optimizes the parameters for the lowest type II error as the first principle, and the highest AUC and accuracy as the second and third principles. GcForest can not only reflect the advantages of traditional statistical models in terms of interpretability and robustness but also take into account the advantages of deep learning models in terms of accuracy. Results: the validity of the hybrid default discrimination model is verified by three real open credit data sets of Australian, Japanese, and German in the UCI database. Conclusions: the performance of the gcForest is better than the current popular single classifiers such as ANN, and the common ensemble classifiers such as LightGBM, and CNNs in type II error, AUC, and accuracy. Besides, in comparison with other similar research results, the robustness and effectiveness of this model are further verified.
format Online
Article
Text
id pubmed-8150340
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-81503402021-05-27 A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest Li, Gang Ma, Hong-Dong Liu, Rong-Yue Shen, Meng-Di Zhang, Ke-Xin Entropy (Basel) Article Background: the credit scoring model is an effective tool for banks and other financial institutions to distinguish potential default borrowers. The credit scoring model represented by machine learning methods such as deep learning performs well in terms of the accuracy of default discrimination, but the model itself also has many shortcomings such as many hyperparameters and large dependence on big data. There is still a lot of room to improve its interpretability and robustness. Methods: the deep forest or multi-Grained Cascade Forest (gcForest) is a decision tree depth model based on the random forest algorithm. Using multidimensional scanning and cascading processing, gcForest can effectively identify and process high-dimensional feature information. At the same time, gcForest has fewer hyperparameters and has strong robustness. So, this paper constructs a two-stage hybrid default discrimination model based on multiple feature selection methods and gcForest algorithm, and at the same time, it optimizes the parameters for the lowest type II error as the first principle, and the highest AUC and accuracy as the second and third principles. GcForest can not only reflect the advantages of traditional statistical models in terms of interpretability and robustness but also take into account the advantages of deep learning models in terms of accuracy. Results: the validity of the hybrid default discrimination model is verified by three real open credit data sets of Australian, Japanese, and German in the UCI database. Conclusions: the performance of the gcForest is better than the current popular single classifiers such as ANN, and the common ensemble classifiers such as LightGBM, and CNNs in type II error, AUC, and accuracy. Besides, in comparison with other similar research results, the robustness and effectiveness of this model are further verified. MDPI 2021-05-08 /pmc/articles/PMC8150340/ /pubmed/34066807 http://dx.doi.org/10.3390/e23050582 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Li, Gang
Ma, Hong-Dong
Liu, Rong-Yue
Shen, Meng-Di
Zhang, Ke-Xin
A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest
title A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest
title_full A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest
title_fullStr A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest
title_full_unstemmed A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest
title_short A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest
title_sort two-stage hybrid default discriminant model based on deep forest
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8150340/
https://www.ncbi.nlm.nih.gov/pubmed/34066807
http://dx.doi.org/10.3390/e23050582
work_keys_str_mv AT ligang atwostagehybriddefaultdiscriminantmodelbasedondeepforest
AT mahongdong atwostagehybriddefaultdiscriminantmodelbasedondeepforest
AT liurongyue atwostagehybriddefaultdiscriminantmodelbasedondeepforest
AT shenmengdi atwostagehybriddefaultdiscriminantmodelbasedondeepforest
AT zhangkexin atwostagehybriddefaultdiscriminantmodelbasedondeepforest
AT ligang twostagehybriddefaultdiscriminantmodelbasedondeepforest
AT mahongdong twostagehybriddefaultdiscriminantmodelbasedondeepforest
AT liurongyue twostagehybriddefaultdiscriminantmodelbasedondeepforest
AT shenmengdi twostagehybriddefaultdiscriminantmodelbasedondeepforest
AT zhangkexin twostagehybriddefaultdiscriminantmodelbasedondeepforest