Paper Title:
Vari-Gram Language Model Based on Category
  Abstract

Category-based statistic language model is an important method to solve the problem of sparse data. But there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and not large amount of computation. (2) class based method always lose some prediction ability to adapt the text of different domain. The authors try to solve above problems in this paper. This paper presents a novel definition of word similarity. Based on word similarity, this paper gives the definition of word set similarity. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance. At the same time, this paper presents a new method to create the vari-gram model.

  Info
Periodical
Edited by
Qi Luo
Pages
995-1000
DOI
10.4028/www.scientific.net/AMM.58-60.995
Citation
L. C. Yuan, "Vari-Gram Language Model Based on Category", Applied Mechanics and Materials, Vols. 58-60, pp. 995-1000, 2011
Online since
June 2011
Authors
Export
Price
$32.00
Share

In order to see related information, you need to Login.

In order to see related information, you need to Login.

Authors: Fang Li, Qun Xiong Zhu
Abstract:LSI based hierarchical agglomerative clustering algorithm is studied. Aiming to the problems of LSI based hierarchical agglomerative...
1306
Authors: Jing Li Zhou, Xue Jun Nie, Lei Hua Qin, Jian Feng Zhu
Abstract:This paper proposes a novel fuzzy similarity measure based on the relationships between terms and categories. A term-category matrix is...
2620
Authors: Yin Sheng Zhang, Hui Lin Shan, Jia Qiang Li, Jie Zhou
Chapter 8: Nanomaterials and Nanomanufacturing
Abstract:The traditional K-means clustering algorithm prematurely plunges into a local optimum because of sensitive selection of the initial cluster...
1977
Authors: Chun Xia Jin, Hai Yan Zhou, Qiu Chan Bai
Chapter 6: Algorithm Design
Abstract:To solve the problem of sparse keywords and similarity drift in short text segments, this paper proposes short text clustering algorithm with...
1716
Authors: Su Xian Zhang, Dong Zhang, Su Xiang Zhang, Bing Zhen Zhao, Lin Yan Xie
Chapter 10: Communication, Networks and Information Technologies
Abstract:In this paper, a novel approach was proposed for the topic detection which combined the multi-models. We paid attention to the content...
866