Research on Classifying in Large Scale Documents Based on Analysising Style of Language by Language Cadence

Article Preview

Abstract:

Classification in large scale documents is a research hot spot. Language style can be used to classify the large scale documents. And language cadence can be used to scribe language style. Then the method based on language cadence to quantity language style is proposed here. Three different tests in the same corpus, the language important feature, cadence can classify different style documents effetely.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1678-1681

Citation:

Online since:

September 2014

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] the 32th of China Internet Development Statistics Report http: /www. cnnic. cn/hlwfzyj/hlwxzbg/hlwtjbg/201307/P020130717505343100851. pdf . July. (2013).

Google Scholar

[2] Yule G U. On sentence length as a statistical characteristic of style in prose with application to two cases of disputed authorship[J]. Biometrika,1938,30: 363-390.

DOI: 10.1093/biomet/30.3-4.363

Google Scholar

[3] Robert J Valenza. Are the thisted-efron authorship tests valid? [J]. Computer and the Humanities, 1991, 25: 27-46.

DOI: 10.1007/bf00054287

Google Scholar

[4] Bao Chen Yao , Study in the differece of words frequece between first 80 charpters and last 40 charpters of Dream of Red Mansions, Journal of Hubei University of Science and Technology, 2013. 09.

Google Scholar

[5] Heintze N. Scalable document fingerprinting. In: Proceedings of the 2nd USENIX Workshop on Electronic Commerce. 1996. http: /www. cs. cmu. edu/afs/cs/user/nch/www/koala/main. html.

Google Scholar

[6] Khmelev D, Tweedy F J. Using Markov chains for identification of Writers[J]. Literary and Linguistic Computing, 2001, 16 (4):299-307.

DOI: 10.1093/llc/16.3.299

Google Scholar

[7] Wu Xiaochun, Huang Xuanjing, Wu Lide. Authorship Identification Based on Semantic Analysis [J]. Journal of Chinese information . 2006(6).

Google Scholar

[8] Shao Yanqiu , Han Jiqing, Liu Ting, Zhao Yongzhen Study on automatic predietion of sentential stress with natural style in Chinese[J]. ACTA ACUSTICA, 2006. 05.

Google Scholar

[9] Li Jiexun, Zheng Rong, Hsin Chun Chen. From Fingerprint to Writeprint[J] Communications of the ACM, 47(3): 70 - 76, 2004. 03.

Google Scholar

[10] NiamhMc Combe. Methods Of Author Identification[D]. B. A. (M od. ) CSLL FinalYear Project, (2002).

Google Scholar

[11] Yang Kaifeng, Zhang Yekun, Li Yan, Feature Selection Method Based on Document Frequency[J]. Computer Engineering, 2010, 09.

Google Scholar