Clustering is an efficient and fundamental unsupervised learning algorithm for many vision-based applications. This paper aim at the problems of fast indexing high-dimensional local invariant features of images (e.g. SIFT features) and quick similarity searching of images in a scalable image database by using a hierarchical clustering algorithm. We adopt the hierarchical K-means (HKM) clustering method to build a visual vocabulary tree efficiently on given training data and represent image as a “bag of visual words” which are the leaf nodes of the visual vocabulary tree. For the application of image retrieval, we adopt an usually-used indexing structure called “inverted file” to record the mapping of each visual word to the database images containing that visual word along with the number of times it appears in each image. We propose a weighted voting strategy for the application of content-based image retrieval and achieve desirable performance through experiments.