Combining Burst Detection for Hot Topic Extraction
As traditional text representations are not suitable for online dynamic streams, this paper presents a hot topic extraction technique that can be used for tracking news topics over time. The model combines individual word burst into the document-word vector representation, which can emphasize the temporally features of text streams. An energy ratio threshold based burst detection approach is proposed and TF-PDF is then combined to weigh the terms. Experiment results demonstrate that this model is effective in topic extraction for news stream and it can better improve the clustering performance.
H. F. Ma and H. L. Ma, "Combining Burst Detection for Hot Topic Extraction", Advanced Materials Research, Vols. 268-270, pp. 1283-1288, 2011