Rule Extraction from Privacy Preserving Neural Network: Application to Banking

Article Preview

Abstract:

In the last two decades in areas like banking, finance and medical research privacy policies restrict the data owners to share the data for data mining purpose. This issue throws up a new area of research namely privacy preserving data mining. In this paper, we proposed a privacy preservation method by employing Particle Swarm Optimization (PSO) trained Auto Associative Neural Network (PSOAANN). The modified (privacy preserved) input values are fed to a decision tree (DT) and a rule induction algorithm viz., Ripper for rule extraction purpose. The performance of the hybrid is tested on four benchmark and bankruptcy datasets using 10-fold cross validation. The results are compared with those obtained using the original datasets where privacy is not preserved. The proposed hybrid approach achieved good results in all datasets.

You might also be interested in these eBooks

Info:

[1] R. Agrawal and R. Srikant, Preserving Privacy in Data Mining, ACM SIGMOD International Conference on Management of Data, May-(2000).

DOI: 10.1145/342009.335438

Google Scholar

[2] Y. Lindell and B. Pinkas, Privacy Preserving in Data Mining, Proceeding of the 20th annual cryptology conference in advances on Cryptology, 2000, pp.36-54.

DOI: 10.1007/3-540-44598-6_3

Google Scholar

[3] W. U. Xiao-dan, Y. U. E. Dian-min, L. I. U. Feng-li, W. Yun-feng, and C. H. Chao-Hsien, Privacy Preserving Data Mining Algorithms by Data Distortion, Management Science and Engineering, 2006, pp.223-228.

DOI: 10.1109/icmse.2006.313871

Google Scholar

[4] F. M. Behlen, S. B. Johnson, Multicenter Patient Records Research: Security Policies and Tools, J Am Med Inform Assoc. Vol. 6, No. 6, 1999, pp.435-43.

Google Scholar

[5] J. J. Berman, Confidentiality Issues for Medical Data Miners, Artificial Intelligent Med. Vol. 26, No. 1-2, 2002, pp.25-36.

Google Scholar

[6] B. Thuraisingham, Web Data Mining and its Applications in Business Intelligence and Counter-terrorism, CRC Press, (2003).

DOI: 10.1201/9780203499511

Google Scholar

[7] S. E. Fienberg, Homeland insecurity: Data mining, terrorism detection, and confidentiality, Australian Bureau of Statistics, 55th Session of the International Statistical Institute (ISI). Sydney, (2005).

Google Scholar

[8] L. Sweeney, Privacy-Preserving Bio-terrorism Surveillance, AAAI Spring Symposium, AI Technologies for Homeland Security, (2005).

Google Scholar

[9] S. R. M. Oliveira and O. R. Zaiane, A privacy-preserving clustering approach toward secure and effective data analysis for business collaboration, Journal of Computer and Security, Vol. 26, 2007, pp.81-83.

DOI: 10.1016/j.cose.2006.08.003

Google Scholar

[10] C. Boyens, R. Krishnan and R. Padman, On privacy-preserving access to distributed heterogeneous healthcare information, System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference on.

DOI: 10.1109/hicss.2004.1265352

Google Scholar

[11] E. Bertino, A Framework for Evaluating Privacy Preserving Data Mining Algorithms", Data Mining and Knowledge Discovery, Vol. 11, 2005, p.121.

DOI: 10.1007/s10618-005-0006-6

Google Scholar

[12] J. Vaidya, C. Clifton and M. Zhu, Privacy Preserving Data Mining, ISBN: 978-0-387-25886-7, Advances in Information Security, Springer, 19, (2006).

Google Scholar

[13] G. Crises, Non-Perturbative Methods for Microdata Privacy in Statistical Databases, http: /citeseer. ist. psu. edu/crises04nonperturbative. html, (2004).

Google Scholar

[14] B. Pinkas, Cryptographic techniques for privacy-preserving data mining, SIGKDD Explorations, 4, (2002).

DOI: 10.1145/772862.772865

Google Scholar

[15] K. Ramu and V. Ravi, Privacy preservation in data mining using hybrid perturbation methods: an application to bankruptcy prediction in banks", International Journal Data Analysis Techniques and Strategies, Vol. 1, No. 4, 2009, pp.313-331.

DOI: 10.1504/ijdats.2009.027509

Google Scholar

[16] Paramjeet, V. Ravi, N. Naveen and C. Raghavendra Rao, Privacy Preserving Data Mining using Particle Swarm Optimization trained Auto-Associative Neural Network: an Application to Bankruptcy Prediction in Banks, (Accepted International Journal of Data Mining Modeling and Management).

DOI: 10.1504/ijdmmm.2012.045135

Google Scholar

[17] J. R. Quinlan, C4. 5: Programs for Machine Learning, Morgan Kaufmann Publishers, SanMateo, (1992).

Google Scholar

[18] W. W. Cohen, Fast Effective Rule Induction, From Machine Learning Proceedings of the Twelfth International Conference (ML95), (1995).

Google Scholar

[19] J. Kennedy and R. C. Eberhart, Particle Swarm Optimization, Proceeding of IEEE International conference on Neural Networks, Piscataway, NJ, USA, 1995, p.1942-(1948).

Google Scholar

[20] H. Hruschka and M. Natter, Comparing performance of feedforward neural nets and K-means for cluster-based market segmentation, European Journal of Operational Research, Vol. 114, 1999, pp.346-353.

DOI: 10.1016/s0377-2217(98)00170-2

Google Scholar

[21] M. A. Kramer, Nonlinear principal component analysis using auto associative neural networks, AIChE Journal, Vol. 37, No. 2, 1991, p.233–243.

DOI: 10.1002/aic.690370209

Google Scholar

[22] V. Ravi and C. Pramodh, Non-linear principal component analysis-based hybrid classifiers: an application to bankruptcy prediction in banks, International Journal of Information and Decision Sciences, Vol. 2, No. 1, 2010, p.50 – 67.

DOI: 10.1504/ijids.2010.029903

Google Scholar

[23] S. Canbas, A. Caubak and S. B. Kilic, Prediction of commercial bank failure via multivariate statistical analysis of financial structures: The Turkish case, European Journal of Operational Research, Vol. 166, 2005, pp.528-546.

DOI: 10.1016/j.ejor.2004.03.023

Google Scholar

[24] Olmeda and E. Fernandez, Hybrid classifiers for financial multicriteria decision making: The case of Bankruptcy prediction, Computational Economics, Vol. 10, 1997, pp.317-335.

Google Scholar

[25] M. J. Beynon and M.J. Peel, Variable precision rough set theory and data discretisation: an application to corporate failure prediction, Omega, Vol. 29, 2001, p.561–576.

DOI: 10.1016/s0305-0483(01)00045-7

Google Scholar

[26] E. Rahimian, S. Singh, T. Thammachote and R. Virmani, Bankruptcy prediction by Neural network" in R. R. Trippi and E. Turban (Eds. ) Neural Networks in Finance and Investing, Irwin Professional Publishing, Burr Ridge, USA, 1996. Appendix Rules generated by Decision Tree (C4. 5) IRIS DATASET Rule 1: If PW<= 0. 505359 and SL <= 0. 443342 then IRIS- VERSICOLOR (coverage =100%) Rule 2: If PW<= 0. 505359 and SL > 0. 443342 then IRIS- VIRGINICA (coverage = 90. 90%) Rule 3: If PW> 0. 505359 then IRIS-SETOSA (coverage = 90. 90%) WBC DATASET Rule 1: If clumpthickness <=0. 350595 then BENIGN (coverage = 100%) Rule 2: If clumpthickness >0. 350595 then MALIGNANT (coverage = 94. 00%) NEW THYROID DATASET Rule 1: If SThyroxin <=0. 307997 and TSH <= 0. 160963 then NORMAL (coverage = 100%) Rule 2: If SThyroxin <=0. 307997 and TSH > 0. 160963 then HypoThyroid (coverage = 85. 71%) Rule 3: If SThyroxin >0. 307997 then HyperThyroid (coverage = 87. 50%) WINE DATASET Rule 1: If Ash <=0. 538132 and Alcalinity of ash <= 0. 455098 and Nonflavanoidphenols<=0. 402591 then CLASS B (coverage = 90. 90%) Rule 2: If Ash<=0. 538132 and Alcalinity of ash > 0. 455098 and Ash <=0. 528514 and Alcalinity of ash<=0. 47219 and Hue<=0. 369992 then CLASS C (coverage = 100%) Rule 3: If Ash <=0. 538132 and Alcalinity of ash>0. 455098 and Ash <=0. 528514 Alcalinity of ash<=0. 47219 and Hue >0. 369992 then CLASS C (coverage =0%) Rule 4: If Ash <=0. 538132 and Alcalinity of ash>0. 455098 and Ash <=0. 528514 and Alcalinity of ash>0. 47219 then CLASS C (coverage = 80. 00%) Rule 5: If Ash <=0. 538132 and Alcalinity of ash>0. 455098 and Ash >0. 528514 then CLASS B (coverage = 0%) Rule 6: If Ash >0. 538132 then CLASS A (coverage = 4. 34%) SPANISH DATASET Rule 1: If (Current assets-cash/total assets) <= 0. 431644 then NonBankrupt (coverage = 75. 00%) Rule 2: If (Current assets-cash/total assets) > 0. 431644 then Bankrupt (coverage = 75. 00%) TURKISH DATASET Rule 1: If (Share holders' equity + total income)/(total assets + contingencies and commitments) <=0. 973129 then Bankrupt (coverage = 100%) Rule 2: If (Share holders, equity + total income)/(total assets + contingencies and commitments) > 0. 973129 then NonBankrupt (coverage = 66. 66%) US DATASET Rule 1: If (Earnings before interest and taxes/total assets) <= 0. 794781 then Bankrupt (coverage = 78. 57%) Rule 2: If (Earnings before interest and taxes/total assets) > 0. 794781 then NonBankrupt (coverage = 81. 81%) UK DATASET Rule 1: If (Current assets/current liabilities) <=0. 204983 then NonBankrupt (coverage = 66. 66%) Rule 2: If (Current assets/current liabilities) >0. 204983 and (Current assets/current liabilities) <=0. 207137 then Bankrupt (coverage = 100%) Rule 3: If (Current assets/current liabilities) >0. 204983 and (Current assets/current liabilities) >0. 207137 and (Funds flow/total liabilities) <= 0. 326856 then NonBankrupt (coverage = 6. 66%) Rule 4: If (Current assets/current liabilities) >0. 204983 and (Current assets/current liabilities) >0. 207137 and (Funds flow/total liabilities) > 0. 326856 then Bankrupt (coverage = 100%) Rules generated by Ripper. IRIS DATASET Rule 1: If PL<=0. 364484 then Iris-setosa (coverage = 100%) Rule 2: If PL<=0. 422152 then Iris-versicolor (coverage = 90. 00%) Rule 3: else Iris-Viriginca (coverage = 90. 90%) WBC DATASET Rule 1: If Clumpthickness>=0. 376957 then Malignant (coverage = 95. 65%) Rule 2: If Clumpthickness>=0. 351184 and Clumpthickness <= 0. 368407 then Malignant (coverage = 100%) Rule 3: else BENIGN (coverage = 34. 32%) WINE DATASET Rule 1: If Alcalinity of ash>=0. 46734 and Proanthocyanins <= 0. 328308 then Class C (coverage = 100%) Rule 2: If Proanthocyanins <=0. 318861 then Class C (coverage = 100%) Rule 3: If Ash >=0. 539347 and Hue >=0. 365639 then Class A (coverage = 100%) Rule 4: If Proanthocyanins >=359059 and Alcalinity of ash >=0. 446665 then Class A (coverage = 100%) Rule 5: else Class B (coverage = 41. 17%) NEW THYROID DATASET Rule 1: If TD>=0. 175793 and Sthyroxin<=0. 296417 then HypoThyroid (coverage = 71. 42%) Rule 2: If SThyroxin >=0. 310244 then HyperThyroid (coverage = 100%) Rule 3: else NORMAL (coverage = 81. 08%) SPANISH DATASET Rule 1: If (Current assets-cash/total assets) <= 0. 431934 then NonBankrupt (coverage = 60. 00%) Rule 2: else Bankrupt (coverage = 71. 42%) TURKISH DATASET Rule 1: If (Interest income/interest expenses) >=0. 415229 then Bankrupt (coverage = 100%) Rule 2: else NonBankrupt (coverage = 80%) US DATASET Rule 1: If (Earnings before interest and taxes/total assets) >= 0. 794884 then NonBankrupt (coverage = 81. 81%) Rule 2: else Bankrupt (coverage = 78. 57%) UK DATASET Rule 1: If (Current liabilities/total assets) <=0. 515514 then NonBankrupt (coverage = 83. 33%) Rule 2: else Bankrupt (coverage = 83. 33%).

Google Scholar