Combining Burst Detection for Hot Topic Extraction

Article Preview

Abstract:

As traditional text representations are not suitable for online dynamic streams, this paper presents a hot topic extraction technique that can be used for tracking news topics over time. The model combines individual word burst into the document-word vector representation, which can emphasize the temporally features of text streams. An energy ratio threshold based burst detection approach is proposed and TF-PDF is then combined to weigh the terms. Experiment results demonstrate that this model is effective in topic extraction for news stream and it can better improve the clustering performance.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 268-270)

Pages:

1283-1288

Citation:

Online since:

July 2011

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2011 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] G. Salton, C. Buckley. Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management, 1989, 4(5): 513-523.

DOI: 10.1016/0306-4573(88)90021-0

Google Scholar

[2] Bun K. K, Ishizuka M. Topic Extraction from News Archive Using TF_PDF Algorithm. In: Proceeding of 3rd International Conference on Web Information Systems Eng. (WISE '02), 2002. 73-82.

DOI: 10.1109/wise.2002.1181645

Google Scholar

[3] Chen K Y, Luesukprasert L, Chou T. Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(8): 1016-1026.

DOI: 10.1109/tkde.2007.1040

Google Scholar

[4] Qi H,  Chang K Y, Lim E P. Using Burstiness to Improve Clustering of Topics in News Streams. In: Proceedings of IEEE Conference on Data Mining, 7(1): 493-498, (2007).

DOI: 10.1109/icdm.2007.17

Google Scholar

[5] Kleinberg J. Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery, 7(4): 159-178. (2003).

Google Scholar

[6] Kumar R, Novak J, Raghavan P and Tomkins A. Bursty and hierarchical structure in streams. World Wide Web, 2006, 8(2): 373-397.

Google Scholar

[7] Fung G P C, Yu X, Yu P S, Lu H J. Parameter Free Bursty Events Detection in Text Streams. In proceedings of the ACM International Conference on Very Large Data Bases, VLDB, 181–192. (2005).

Google Scholar

[8] Porter M. An Algorithm for Suffix Stripping. Program, 1980, 14(3): 211–218.

Google Scholar

[9] http: /www. nist. gov/speech/tests/tdt/, (2004).

Google Scholar