Using Different Term Weighting Schemes of Centroid-Based Classifiers to Classify Drug Monographs


Article Preview

With an increasing number of documents for drug monographs on the Internet, automatic classification of documents is an important task for organizing these documents into appropriated classes. The monographs of drug can be regularly categorized by their indications. A centroid-based classifier is a relatively high performance classifier with relatively less computation. To enhance the efficiency of standard centroid-based classifier with TFIDF to classify drug monographs, different term weighting schemes of a centroid-based classifier are evaluated. Moreover, the combination of a set of centroid-based classifiers with different term weighting schemes is proposed in this work. To evaluate the proposed method, two set of drug monographs are drawn from DailyMed and RxList websites are used. From the experimental results, the proposed method can improve the performance of the centroid-based classifier.



Edited by:

Keon Myung Lee, Prasad Yarlagadda and Yang-Ming Lu




V. Lertnattee and C. Lueviphan, "Using Different Term Weighting Schemes of Centroid-Based Classifiers to Classify Drug Monographs", Applied Mechanics and Materials, Vols. 462-463, pp. 968-973, 2014

Online since:

November 2013




* - Corresponding Author

[1] G. K. McEvoy, E. K. Snow, AHFS Drug Information 2012, American society of health-system pharmacists, Maryland, (2012).

[2] R. J. Roiger, M. W. Geatz, Data Mining: A Tutorial Based Primer, Boston, (2003).

[3] F. Sebastiani, Machine Learning in Automated Text Categorization, ACM Comput Surv. 34 (2002) 1-47.


[4] K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell, Text classification from labeled and unlabeled documents using EM, Mach Learn - Special issue on information retrieval. 39 (2000) 103–134.


[5] M. E. Ruiz, P. Srinivasan, Hierarchical text categorization using neural networks, Inform Retrieval. 5 (2002) 87–118.

[6] M. Kubat, M. Cooperson Jr., Voting nearest-neighbor subclassifiers, Proc. 17th International Conf. on Machine Learning. (2000) 503–510.

[7] E. -H. Han, G. Karypis, Centroid-based document classification: Analysis and experimental results, Proc. 4th European Conference on Principles of Data Mining and Knowledge. (2000) 424–431.


[8] V. Lertnattee, T. Theeramunkong, Effect of term distributions on centroid-based text categorization, Inform Sciences. 158 (2004) 89–115.


[9] T. Joachims, Learning to Classify Text using Support Vector Machines, Kluwer Academic Publishers, Massachusetts, (2002).


[10] G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval, Inform Process Manag. 24 (1988) 513–523.


[11] A. Singhal, G. Salton, and C. Buckley, Length normalization in degraded text collections, Proc. of 5th Annual Symposium on Document Analysis and Information Retrieval. (1995) 15-17.