An Error Minimizing Partitioning Method for the Nearest Neighbor Search

Article Preview

Abstract:

The nearest neighbor search in high-dimensional space is an important operation in many applications, such as data mining and multimedia databases. Evaluating similarity in high-dimensional space requires high computational cost; index-structures are frequently used for reducing computational cost. Most of these index-structures are built by partitioning the data set. However, the partitioning approaches potentially have the problem of failing to find the nearest neighbor that is caused by partitions. In this paper, we propose the Error Minimizing Partitioning (EMP) method with a novel tree structure that minimizes the failures of finding the nearest neighbors. EMP divides the data into subsets with considering the distribution of data sets. For partitioning a data set, the proposed method finds the line that minimizes the summation of distance to data points. The method then finds the median of the data set. Finally, our proposed method determines the partitioning hyper-plane that passes the median and is perpendicular to the line. We also make a comparative study between existing methods and the proposed method to verify the effectiveness of our method.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2165-2170

Citation:

Online since:

June 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Bernstein, A., Klein, M., "Toward high-precision service retrieval," IEEE Internet Computing, vo.8, no.1, pp.30-36, 2004.

DOI: 10.1109/mic.2004.1260701

Google Scholar

[2] Bozkaya, T., Ozsoyoglu, M., "Distance-based indexing for high-dimensional metric spaces," In Proceedings of the ACM SINGMOD international conference on Management of data (SIGMOD '97), pp.357-368, 1997.

DOI: 10.1145/253260.253345

Google Scholar

[3] Ciaccia, P., Patella, M., Zezula, P., "M-tree: An efficient access method for similarity search in metric spaces," The 23rd International Conference on Very Large Data Bases, pp.426-435, 1997.

Google Scholar

[4] Fu, A.W., Chan, P.M., Cheung, Y.L., Moon, Y.S., "Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances," The VLDB Journal, vo.9, no.2, pp.154-173, 1998.

DOI: 10.1007/pl00010672

Google Scholar

[5] Guttman, A., "R-trees a dynamic index structure for spatial searching," The ACM SIGMOD international conference on Management of data (SIGMOD '84), pp.47-57, 1984.

DOI: 10.1145/602259.602266

Google Scholar

[6] Hsu, C.W., Lin, C.J., "A comparison of methods for multi-class support vector machines," IEEE Transactions on Neural Networks, vo.13, no.2, pp.415-425, 2002.

DOI: 10.1109/72.991427

Google Scholar

[7] Katayama, N., Satoh, S., "The SR-tree: An index structure for high-dimensional nearest neighbor queries," The ACM SIGMOD international conference on Management of data (SIGMOD '97), pp.369-380, 1997.

DOI: 10.1145/253260.253347

Google Scholar

[8] Weber, R., Schek, H.J., Blott, S., "A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces," The International Conference on Very Large Data Bases (VLDB'98), pp.194-205, 1998.

Google Scholar

[9] Xiong, T., Ye, J., Li, Q., Cherkassky, V., Janardan, R., "Efficient kernel discriminant analysis via QR decomposition," NIPS, p.1529–1536. 2005.

Google Scholar

[10] Yianilos, P., "Data structures and algorithms for nearest neighbor search in general metric spaces," The Fourth ACM–SIAM Symposium on Discrete Algorithms (SODA'93), p.311–321, 1993.

Google Scholar

[11] Zhang, M., Alhajj, R., "Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space," The journal of Knowledge and Information Systems, DOI== http://doi.acm.org/10.1007/s10115-008-0190, 2009.

DOI: 10.1007/s10115-008-0190-y

Google Scholar

[12] UC Irvine Machine Learning Repository, http://archive.ics.uci.edu/ml/

Google Scholar