Document Clustering Based on Fuzzy Similarity

Article Preview

Abstract:

This paper proposes a novel fuzzy similarity measure based on the relationships between terms and categories. A term-category matrix is presented to represent such relationships and each element in the matrix denotes a membership degree that a term belongs to a category, which is computed using term frequency inverse document frequency and fuzzy relationships between documents and categories. Fuzzy similarity takes into account the situation that one document belongs to multiple categories and is computed using fuzzy operators. The experimental results show that the proposed fuzzy similarity surpasses other common similarity measures not only in the reliable derivation of document clustering results, but also in document clustering accuracies.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2620-2626

Citation:

Online since:

August 2010

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2010 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] C. Carpineto, S. Osiski, G. Romano and D. Weiss. A Survey of Web Clustering Engines. ACM Computing Surveys, v 41, n 3, p.17: 1-17: 38, July (2009).

DOI: 10.1145/1541880.1541884

Google Scholar

[2] E. Fersini, E. Messina and F. Archetti. A Probabilistic Relational Approach for Web Document Clustering. Information Processing and Management, v 46, n 2, pp.117-130, March (2010).

DOI: 10.1016/j.ipm.2009.08.003

Google Scholar

[3] S. Nirkhi and K. N. Hande. A Survey on Clustering Algorithm for Web Applications. In: Proceedings of the 2008 International Conference on Semantic Web and Web Services (SWWS 2008), pp.124-129, July (2008).

Google Scholar

[4] C. Carpineto , S. Mizzaro, G. Romano and M. Snidero. Mobile information retrieval with search results clustering: Prototypes and evaluations. Journal of the American Society for Information Science and Technology, v 60, n 5, pp.877-95, May (2009).

DOI: 10.1002/asi.21036

Google Scholar

[5] P. Jonghun, C. Byung-Cheon and K. Kwanho. A Vector Space Approach to Tag Cloud Similarity Ranking. Information Processing Letters, v 110, n 7. pp.1-8, March, (2010).

Google Scholar

[6] P. H. A. Sneath and R. R. Sokal. Numerical Taxonomy-The Principles and Practice of Numerical Classification. W H Freeman & Co (Sd) , SanFrancisco, June (1973).

Google Scholar

[7] L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, NewYork, March (1990).

Google Scholar

[8] L.A. Zadeh. Fuzzy Sets. Information and Control 8 (1965) 338-353.

Google Scholar

[9] J.C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms, Kluwer Academic Publishers , Norwell, MA, USA , (1981).

Google Scholar

[10] N.R. Pal, K. Pal, J.M. Keller and J.C. Bezdek. A Possibilistic Fuzzy c-Means Clustering Algorithm. Journal of Process Control, v 16, n 10, pp.1055-73, Dec. (2006).

DOI: 10.1109/tfuzz.2004.840099

Google Scholar

[11] K. Kummamuru, A. Dhawale and R. Krishnapuram. Fuzzy Co-Clustering of Documents and Keywords, in: Proceedings of the 12th IEEE International Conference on Fuzzy Systems (Cat. No. 03CH37442), vol. 2, pp.772-7, (2003).

DOI: 10.1109/fuzz.2003.1206527

Google Scholar

[12] C. -H. Oh, K. Honda and H. Ichihashi. Fuzzy Clustering for Categorical Multivariate Data. in: Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference, pp.2154-9 vol. 4, (2001).

DOI: 10.1109/nafips.2001.944403

Google Scholar

[13] K. Honda, H. Ichihashi, F. Masulli and S. Rovetta. Linear Fuzzy Clustering with Selection of Variables Using Graded Possibilistic Approach. IEEE Transactions on Fuzzy Systems, v 15, n 5, pp.878-889, Oct. (2007).

DOI: 10.1109/tfuzz.2006.889946

Google Scholar

[14] D. H. Widyantoro and J. Yen. A Fuzzy Similarity Approach in Text Classification Task. Ninth IEEE International Conference on Fuzzy Systems. FUZZ- IEEE 2000 (Cat. No. 00CH37063), pp.653-658 vol. 2, (2000).

DOI: 10.1109/fuzzy.2000.839070

Google Scholar