Research on Improved Clustering Algorithm on Web Usage Mining Based on Scientific Analysis of Web Materials
Clustering analysis is an important method to research the Web user’s browsing behavior and identify the potential customers on Web usage mining. The traditional user clustering algorithms are not quite accurate. In this paper, we give two improved user clustering algorithms, which are based on the associated matrix of the user’s hits in the process of browsing website. To this matrix, an improved Hamming distance matrix is generated by defining the minimum norm or the generalized relative Hamming distance between any two vectors. Then, similar user clustering are obtained by setting the threshold value. At the last step of our algorithm, the clustering results are confirmed by defining the clustering’s Similar Index and setting sub-algorithm. Finally, the testing examples show that the new algorithms are more accurate than the old one, and the real log data presents that the improved algorithms are practical.
Helen Zhang and David Jin
B. Li et al., "Research on Improved Clustering Algorithm on Web Usage Mining Based on Scientific Analysis of Web Materials", Applied Mechanics and Materials, Vols. 63-64, pp. 863-867, 2011