Cargando…

A two-route CNN model for bank account classification with heterogeneous data

Classifying bank accounts by using transaction data is encouraging in cracking down on illegal financial activities. However, few research simultaneously use heterogenous features, which are embedded in the time series data. In this paper, a two route convolution neural network TRHD-CNN model, fed w...

Descripción completa

Detalles Bibliográficos
Autores principales: Lv, Fang, Huang, Junheng, Wang, Wei, Wei, Yuliang, Sun, Yunxiao, Wang, Bailing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6699796/
https://www.ncbi.nlm.nih.gov/pubmed/31425545
http://dx.doi.org/10.1371/journal.pone.0220631
Descripción
Sumario:Classifying bank accounts by using transaction data is encouraging in cracking down on illegal financial activities. However, few research simultaneously use heterogenous features, which are embedded in the time series data. In this paper, a two route convolution neural network TRHD-CNN model, fed with two types of heterogeneous feature matrices, is proposed for classifying the bank accounts. TRHD-CNN adopts divide and conquer strategy to extract characteristics from two types of data source independently. The strategy is proved able in mining complementary classification characteristics. We firstly transfer the original log data into a directed and dynamic transaction network. On the basis of that, two feature generation methods are devised for extracting information from local topological structure and time series transaction respectively. A DirectedWalk method is developed in this paper to learning the network vector of vertices used for embedding the neighbor relationship of bank account. The extensive experimental results, conducted on a real bank transaction dataset that contains illegal pyramid selling accounts, show the significant advantage of TRHD-CNN over the existing methods. TRHD-CNN can provide recall scores up to 5.15% higher than competing methods. In addition, the two-route architecture of TRHD-CNN is easy to extend to multi-route scenarios and other fields.