Research on Improved Clustering Algorithm on Web Usage Mining Based on Scientific Analysis of Web Materials

Article Preview

Abstract:

Clustering analysis is an important method to research the Web user’s browsing behavior and identify the potential customers on Web usage mining. The traditional user clustering algorithms are not quite accurate. In this paper, we give two improved user clustering algorithms, which are based on the associated matrix of the user’s hits in the process of browsing website. To this matrix, an improved Hamming distance matrix is generated by defining the minimum norm or the generalized relative Hamming distance between any two vectors. Then, similar user clustering are obtained by setting the threshold value. At the last step of our algorithm, the clustering results are confirmed by defining the clustering’s Similar Index and setting sub-algorithm. Finally, the testing examples show that the new algorithms are more accurate than the old one, and the real log data presents that the improved algorithms are practical.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

863-867

Citation:

Online since:

June 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] J. Srivastava, etal, Web usage mining: Discovery and applications of usage patterns from web data, SIGKDD Explorations, Vol. 1-2( 2000), p. l2-23.

DOI: 10.1145/846183.846188

Google Scholar

[2] B. Mobasher and R. Cooley, Creating adaptive Web sites through Usage-based clustering of URLs, Proc of the 1999 IEEE Knowledge and Data Engineering Exchange workshop, New York: IEEE Press (1999), pp.32-37.

DOI: 10.1109/kdex.1999.836525

Google Scholar

[3] G. Paliouras, et al, Clustering the users of large web sites into communities, Proc of the 17th Int Conf on Machine Learning, San Mateo: Morgan Kaufmann (2000), p. 7l9-728.

Google Scholar

[4] Y. L. Yang, X. D. Guan and J. Y. You, Mining the page Clustering Based on the Content of Web Pages and the Site Topology, Journal of Software, Vol. 13-3(2002), pp.467-469.

Google Scholar

[5] Q. B. Song and J. Y. Shen, An Efficient And Multi-Purpose Algorithm For Mining Web Logs, Journal of Computer Research & Development, Vol. 38-3(2001), pp.328-333.

Google Scholar

[6] D. Beeferman and A. Berger, Agglomerative Clustering of a SearchEngine Query Log, Proceedings of the 6 ACM SIGKDD International Conference. Boston: ACM Press (2000), p.407-4l5.

DOI: 10.1145/347090.347176

Google Scholar

[7] X. Y. Li and J. S. Yuan, Efficient Clustering Algorithm Used for Web Search, Computer Engineering, Vol. 32-20( 2006), pp.38-39.

Google Scholar

[8] Y. Fu, K. Sandhu and M. Shih, A generalization-based approach to clustering of Web usage session, , Web Usage Analysis and User Profiling. New York: Springer-Verlag(2000), pp.21-38.

DOI: 10.1007/3-540-44934-5_2

Google Scholar

[9] P. Kumar, P. R. Krishna and R. S. Bapi, et al, Rough Clustering of Sequential Data, Data & Knowledge Engineering, Vol. 63-2(2007), pp.183-199.

DOI: 10.1016/j.datak.2007.01.003

Google Scholar