INOD: A Graph-Based Outlier Detection Algorithm

Article Preview

Abstract:

The outlier detection is to select uncommon data from a data set, which can significantly improve the quality of results for the data mining algorithms. A typical feature of the outliers is that they are always far away from a majority of data in the data set. In this paper, we present a graph-based outlier detection algorithm named INOD, which makes use of this feature of the outlier. The DistMean-neighborhood is used to calculate the cumulative in-degree for each data. The data, whose cumulative in-degree is smaller than a threshold, is judged as an outlier candidate. A KNN-based selection algorithm is used to determine the final outlier. Experimental results show that the INOD algorithm can improve the precision 80% higher and decrease the error rate 75% lower than the classical LOF algorithm.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1008-1012

Citation:

Online since:

December 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Ramaswamy S, Rastogi R, Shim K. Efficient algorithms for mining outliers from large data sets [C]. ACM SIGMOD Record: ACM, 2000. 427-438.

DOI: 10.1145/335191.335437

Google Scholar

[2] Hawkins DM. Identification of outliers [M]. Chapman and Hall London, (1980).

Google Scholar

[3] Borne KD, Vedachalam A. Surprise Detection in Multivariate Astronomical Data [M]. In: Statistical Challenges in Modern Astronomy V: Springer, 2012: 275-289.

DOI: 10.1007/978-1-4614-3520-4_26

Google Scholar

[4] Bay SD, Schwabacher M. Mining distance-based outliers in near linear time with randomization and a simple pruning rule [C]. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining: ACM, 2003. 29-38.

DOI: 10.1145/956750.956758

Google Scholar

[5] Hautamaki V, Karkkainen I, Franti P. Outlier detection using k-nearest neighbour graph [C]. Pattern Recognition, 2004 ICPR 2004 Proceedings of the 17th International Conference on: IEEE, 2004. 430-433.

DOI: 10.1109/icpr.2004.1334558

Google Scholar

[6] Yanyan H, Zhongnan Z, Minghong L, Yize T, Shaobin Z. A hybrid distance-based outlier detection approach [C]. Systems and Informatics (ICSAI), 2012 International Conference on, 2012. 2212-2216.

Google Scholar

[8] Abdi H, Williams L J. Principal component analysis[J]. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(4): 433-459.

DOI: 10.1002/wics.101

Google Scholar

[9] Breunig MM, Kriegel H-P, Ng RT, Sander J. LOF: identifying density-based local outliers [C]. ACM Sigmod Record: ACM, 2000. 93-104.

DOI: 10.1145/335191.335388

Google Scholar

[10] Hawkins S, He H, Williams G, et al. Outlier detection using replicator neural networks[M]/Data Warehousing and Knowledge Discovery. Springer Berlin Heidelberg, 2002: 170-180.

DOI: 10.1007/3-540-46145-0_17

Google Scholar