Investigating the Performance of Cosine Value and Jensen-Shannon Divergence in the kNN Algorithm

Article Preview

Abstract:

K Nearest Neighbor (kNN) is a commonly-used text categorization algorithm. Previous studies mainly focused on improvements of the algorithm by modifying feature selection and k value selection. This research investigates the possibility to use Jensen-Shannon Divergence as similarity measure in the kNN classifier, and compares the performance, in terms of classification accuracy. The experiment denotes that the kNN algorithm based on Jensen-Shannon Divergence outperforms that based on Cosine value, while the performance is also largely dependent on number of categories and number of documents in a category.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 532-533)

Pages:

1455-1459

Citation:

Online since:

June 2012

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] M. Aci, C. Inan, M. Avci : A hybrid classification method of k nearest neighbor, Bayesian methods and genetic algorithm., Expert Systems with Applications, 2010: 5061–5067.

DOI: 10.1016/j.eswa.2009.12.004

Google Scholar

[2] D. Carmel, H. Roitman, H. Zwerdling:. Enhancing cluster labeling using Wikipedia., Proceedings of the 32nd annual international ACM SIGIR conference on Research and development in information retrieval. 2009. 139-146.

DOI: 10.1145/1571941.1571967

Google Scholar

[3] D. Carmel, E. Yom-Tov, H. Roitman: Enhancing Digital Libraries Using Missing Content Analysis., Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries. 2008. 1-10.

DOI: 10.1145/1378889.1378891

Google Scholar

[4] D. Carmel, E. Yom-Tov, A. Darlow, D. Pelleg: What Makes a Query Difficult?, Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. 2006. 390-397.

DOI: 10.1145/1148170.1148238

Google Scholar

[5] G. Henkelman, G. Johannesson, H. Jónsson, in: Theoretical Methods in Condencsed Phase Chemistry, edited by S.D. Schwartz, volume 5 of Progress in Theoretical Chemistry and Physics, chapter, 10, Kluwer Academic Publishers (2000).

Google Scholar

[6] I. Dagan, L. Lee, F. Pereira: Similarity-based Models of Word Co-occurrence Probabilities., Machine Learning, 1999(1): 43.

Google Scholar

[7] N. García-Pedrajas, D. Ortiz-Boyer: Boosting k-Nearest Neighbor Classifier by Means of Input Space Projection., Expert Systems with Applications, 36 2009: 10570-10582.

DOI: 10.1016/j.eswa.2009.02.065

Google Scholar

[8] G. Guo, H. Wang, D. Bell, Y. Bi, K. Greer: Using kNN Model for Automatic Text Categorization., Soft Comput , 2006(10): 423-430.

DOI: 10.1007/s00500-005-0503-y

Google Scholar

[9] X. Hao, X. Tao, C. Zhang, Y. Hu: An Effective Method To Improve kNN Text Classifier., Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing. 2007. 379-384.

DOI: 10.1109/snpd.2007.296

Google Scholar

[10] R. Imad, P. William: An Optimized Approach for KNN Text Categorization Using P-Trees., Proceedings of the 2004 ACM symposium on Applied computing. 2004. 613–617.

DOI: 10.1145/967900.968026

Google Scholar

[11] B. Li, S. Yu, Q. Lu: An Improved k-Nearest Neighbor Algorithm for Text Categorization., Proc. of the 20th International Conference on Computer Processing of Oriental Languages. (2003).

Google Scholar

[12] X. Li, S. Shi, V. Charastrakul, J. Zhou: Advanced P-Tree based K-Nearest Neighbors for Customer Preference Reasoning Analysis., J Intell Manuf, 20(2009): 569-579.

DOI: 10.1007/s10845-008-0146-9

Google Scholar

[13] X. Li, P. Xu, L. Huang, X. Shen: Reseach of Journals Manuscript Categorization Based on kNN Algorithm (in Chinese)., Document, Information & Knowledge, 2010(4): 71-76.

Google Scholar

[14] B. Lim, M. Tsui, V. Charastrakul, D. Shi: Web Search with Text Categorization Using Probabilistic Framework of SVM., IEEE International Conference on Systems, Man, and Cybernetics. 2006. 2950-2955.

DOI: 10.1109/icsmc.2006.384566

Google Scholar

[15] Y. Song, J. Huang, D. Zhou, H. Zha, C. Giles: IKNN: Informative k-Nearest Neighbor Pattern Classification., Proceedings of Oriental Languages. 2007. 248-264.

DOI: 10.1007/978-3-540-74976-9_25

Google Scholar

[16] S. Tan: Neighbor-weighted K-Nearest Neighbor for Unbalanced Text Corpus., Expert Systems with Applications, 28(2005): 667-671.

DOI: 10.1016/j.eswa.2004.12.023

Google Scholar

[17] Y. Wang, Z. Wang: A Fast kNN Algorithm for Text Categorization., Proceedings of the Sixth International Conference on Machine Learning and Cybernetics. 2007. 3436-3441.

DOI: 10.1109/icmlc.2007.4370742

Google Scholar

[18] X. Xu, Q. Zhang: Research of Medical Information Text Categorization Based on KNN Algorithm (in Chinese)., Computer technology and development, 19(4) 2009: 206-209.

Google Scholar

[19] Y. Yang, X. Liu: A Re-examination of Text Categorization Methods., Proceedings of 22nd ACM SIGIR Conference on Research and Development in Information Retrieval. 1999. 42-49.

DOI: 10.1145/312624.312647

Google Scholar

[20] N. Zhang, Z. Jia: Text Categorization with KNN Algorithm (in Chinese)., Comupter Engineering, (31)8 2005: 171-173.

Google Scholar