Keyword Extraction of Document Based on Weighted Complex Network

Article Preview

Abstract:

This document explains and demonstrates how to extract keyword from Chinese document based on weighted complex network. The characteristic and disadvantages of several common automatic keyword extraction methods are introduced firstly. Then based on the ideas of complex network, we proposed an improved automatic keyword extraction method. Using complex network, a Chinese document is first represented as a network: the node represents the term, and the edge represents the Co-Occurrence of terms. Then we calculate the integrate value of each term, the keywords are top k terms with greatest value. The experiment results show that the method is more effective and accurate in comparison with the traditional method TFIDF keyword extraction from the same document.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 403-408)

Pages:

2146-2151

Citation:

Online since:

November 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2012 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Turney P. Learning to extract key phrases from text. NRC/ERB-1057[R] (1999).

Google Scholar

[2] Witten I. H , Paynter G. W, Frank E, Gutw in C, Nw vill -Manning C, G. KEA [ C] / / Proceeding s of the 4th ACM conference on Digital Libraried. Berkeley, California, US, 254-256. (1999).

Google Scholar

[3] Lvhn H. P. A statistical approach to the mechanized encoding and searching of literary information [J]. IBM Research and Development, 1(4): 309-317 (1957).

DOI: 10.1147/rd.14.0309

Google Scholar

[4] Salton G, Yang CS On the specification of term values in automatic indexing [J]. Documentation, 29(4): 351-372 (1973).

DOI: 10.1108/eb026562

Google Scholar

[5] Cancho, R.F.I. and R.V. Sole, The small world of human language [C]. Proceedings of The Royal Society of London, Series B, Biological Sciences. 268: 2261-2265 (2001).

DOI: 10.1098/rspb.2001.1800

Google Scholar

[6] ZHAO Peng, CAI Qing-Sheng, WANG Qing-Yi, GENG Huan-Tong. An Automatic Keyword Extraction of Chinese Document Algorithm Based on Complex Network Features. Pattern recognition and artificial intelligence. Vol. 20, No. 6 (2007).

Google Scholar

[7] Ohsawa Y, Benson N E, Yachida M. KeyGraph: automatic in-dexing by co-occurrence graph based on building constructionmetaphor[Z]. Research and Technology Advances in Digital Li-braries, 12-18 (1998).

DOI: 10.1109/adl.1998.670375

Google Scholar

[8] Yutaka Matsuo; Yukio Ohsswa; Mitsuru Ishizuka KeyWorld: Extracting Keywords from a Document as a Small World[C]. Discovery Science, 4th International Conference, 2001: 271-281.

DOI: 10.1007/3-540-45650-3_24

Google Scholar

[9] MA Li, JIAO Licheng, BAI Lin, ZHOU Yafu, DONG Luobing. Research on a Compound Keywords Detection Method Based on Small World Model. Vol. 23, No. 3 (2009).

Google Scholar

[10] ZHANG Min, GENG Huan-tong, WANG Xu-fa. Automatic Keyword Extraction Algorithm Research Using BC Method. Vol. 28, No. 1 (2007).

Google Scholar

[11] Latora V, Marchiori M. A measure of centrality based on network efficiency[Z]. Cond-Mat/ 0402050, (2004).

Google Scholar

[12] Bo Jin, Teng Hongfei, Shi Yanjun, Qu Fuzheng. Chinese Patent Mining Based on Sememe Statistics and Key-Phrase Extraction. Proc. of ADMA Conference, Harbin, (2007).

DOI: 10.1007/978-3-540-73871-8_48

Google Scholar

[13] Jiao Hui, Liu Qian, Jia Huibo. Chinese keyword extraction based on N-gram and word co-occurrence. Proc. Of International Conference on Computational Intelligence and Security Workshops, Harbin, (2007).

DOI: 10.1109/cisw.2007.4425468

Google Scholar