Vari-Gram Language Model Based on Category

Abstract:

Article Preview

Category-based statistic language model is an important method to solve the problem of sparse data. But there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and not large amount of computation. (2) class based method always lose some prediction ability to adapt the text of different domain. The authors try to solve above problems in this paper. This paper presents a novel definition of word similarity. Based on word similarity, this paper gives the definition of word set similarity. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance. At the same time, this paper presents a new method to create the vari-gram model.

Info:

Periodical:

Edited by:

Qi Luo

Pages:

995-1000

DOI:

10.4028/www.scientific.net/AMM.58-60.995

Citation:

L. C. Yuan "Vari-Gram Language Model Based on Category", Applied Mechanics and Materials, Vols. 58-60, pp. 995-1000, 2011

Online since:

June 2011

Authors:

Export:

Price:

$35.00

In order to see related information, you need to Login.

In order to see related information, you need to Login.