E-Mail Filtration and Classification Based on Variable Weights of the Bayesian Algorithm

Article Preview

Abstract:

The co-occurrence word emphasize the word and word internal relations, so its use can improve shortage from the hypothetical of Bayesian algorithm. To build Token Dictionary, Information Gain algorithm is used to choose Tokens, and Synonyms Dictionary is used to acquire more Tokens. By large amounts of training, the matching scores of Token are counted, according to the matching rate the Tokens that is valuable are selected, and the Token Dictionary is established. The proposed method is used to E-mail classification experiment, the results show that the accuracy of spam filter has a well improvement.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2111-2114

Citation:

Online since:

February 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] Qiu Kening, Guo Qingshun, Zhang Xiaobo. The Research of Personalized Classification E-mail System Based on Agent. Computer Engineering and Application. Vol. 30No. 7, July 2005, pp.176-178.

Google Scholar

[2] METSIS V, ANDROUTSOPOULOS I, PALIOURAS G. Spam filtering with Naive Bayes-Which Naive Bayes? [ C] / / Proc of the 2nd Conference on E-mail and AntiSpam( CEAS) . California Mountain View, 2006 : 27- 28.

Google Scholar

[3] Y. H. Li and A. K. Jain. Classification of Text Documents. The Computer Journal. Vol. 41(8). 1998: 537-546.

Google Scholar

[4] Mitchell TM. Machine Learning[M]. McGraw-Hill. (1997).

Google Scholar

[5] Kenneth.W. C and Patrick H. Word Association Norms, Mutual Information and Lexicography. In Proceedings of ACL 27, Vancouver, Canada, 1989. PP: 76-83.

Google Scholar

[6] CERNET Computer Emergency Response Team. http: /www. ccert. edu. cn/spam/sa/datasets. htm.

Google Scholar