Outlier Analysis in Large Sample and High Dimensional Data Based on Feature Weighting

Article Preview

Abstract:

The usual method of outlier analysis is mainly analyzing the outliers according to the Anomaly Index and Variable Contribution Measurement. But in the analysis of large samples of high-dimensional data, this method is difficult. Owing to this, this paper presents a method that weight value for outliers is introduced. The features of outliers are weighted by Analytic Hierarchy Process method. Through this method, the importance of each property of outlier for data mining’s target is rationed, namely the weight number of each property is calculated. And then the correlation values, which represent the degree of relevance between outliers and data mining target, are calculated by using the weight number multiplying by the property value. After correlation values computed, we array the correlation values of outlier from high to low then outlier analysis can become more efficient. At the end of this paper, an instance is presented to demonstrate the maneuverability and feasibility of the method.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

650-657

Citation:

Online since:

June 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] D. Hawkins. Identification of Outliers. London: Chapman and Hall, 1980, 10-35.

Google Scholar

[2] Varun Chandola, Arindam Banerjee, Vipin Kumar et al. Anomaly Detection: A Survey. ACM computing surveys, 2009, 41(3): 15. 1~15. 58.

DOI: 10.1145/1541880.1541882

Google Scholar

[3] LI Qiang, LI Zhendong. Application of The isolated point analysis research in data mining. Microcomputer Applications, 2006, 27(3):323~327.

Google Scholar

[4] Asma S. Larik, Sajjad Haider. Clustering based anomalous transaction reporting. Procedia Computer Science, Volume 3, 2011, 606~610.

DOI: 10.1016/j.procs.2010.12.101

Google Scholar

[5] Xudong Zhu, Zhijing Liu. Human behavior clustering for anomaly detection. Frontiers of Computer Science in China, 2011 5(3) 279~289.

Google Scholar

[6] Mohammad Zaid Pasha and Nitin Umesh. Article: A Comparative Study on Outlier Detection Techniques. International Journal of Computer Applications 66(24): 23-27, March 2013. Published by Foundation of Computer Science, New York, USA.

Google Scholar

[7] Budalakoti, S., Srivastava, A.N., Otey, M.E. et al. Anomaly Detection and Diagnosis Algorithms for Discrete Symbol Sequences with Applications to Airline Safety. IEEE transactions on systems, man and cybernetics. Part C, Applications and reviews, 2009, 39(1): 101~113.

DOI: 10.1109/tsmcc.2008.2007248

Google Scholar

[8] Chen Change Loy , Tao Xiang, Shaogang Gong. Detecting and discriminating behavioural anomalies . Pattern Recognition. Volume 44, Issue 1, January 2011, Pages 117~132.

DOI: 10.1016/j.patcog.2010.07.023

Google Scholar

[9] SPSS Inc. SPSS Clementine12. 0 Modeling Nodes. Printed in the United States of America(2007): 51.

Google Scholar

[10] T. Zhang, R. Ramakrishnan, and M. Livny. BRICH: An efficient data clustering method for very large databases. In proc. of the 1996 ACM SIGMOD, Montreal, Canada, June 1996. 103~114.

DOI: 10.1145/235968.233324

Google Scholar

[11] SPSS Inc. SPSS Clementine12. 0 Modeling Nodes. Printed in the United States of America(2007): 52.

Google Scholar

[12] SPSS Inc. SPSS Clementine12. 0 Algorithms Guide Anomaly Detection[M]. Printed in the United States of America(2007): 15.

Google Scholar

[13] Xing Liu-Wei. The Application of K-means Algorithm in Customer Segmentation. Chengdu: Southwestern University of Finance and Economics, (2007).

Google Scholar

[14] Kamal M. Al-Subhi Al-Harbi. Application of the AHP in project management. International Journal of Project Managemnent 19(2001)19~27.

DOI: 10.1016/s0263-7863(99)00038-1

Google Scholar

[15] Tversky A. Elimination by aspects: a theory of choice. Psychological Review 1972; 79(4): 281~99.

Google Scholar