Vari-Gram Language Model Based on Category

Li Chi Yuan

doi:10.4028/www.scientific.net/AMM.58-60.995

Paper Titles

Fault Diagnosis of Rolling Bearing Based on Rough Set and Neural Network
p.974

Research on the Visual-Thinking in “Re-Design” Based on Macromedia
p.978

A New Learning Algorithm of SVM from Linear Separable Samples
p.983

Grounding Faulted Feeder Detection Based on Fuzzy Clustering Algorithms
p.989

Vari-Gram Language Model Based on Category
p.995

A Clustering Method Based on Attribute Reduction and SOM
p.1001

Research and Hardware Design of Scalable Dual-Field Montgomery Modular Inversion Algorithm
p.1007

An Effective Self-Adapting Localization Algorithm in Wireless Sensor Networks
p.1013

The Application of Bayesian Statistics in Calculating Project Buffer
p.1018

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 58-60Vari-Gram Language Model Based on Category

Vari-Gram Language Model Based on Category

Abstract:

Category-based statistic language model is an important method to solve the problem of sparse data. But there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and not large amount of computation. (2) class based method always lose some prediction ability to adapt the text of different domain. The authors try to solve above problems in this paper. This paper presents a novel definition of word similarity. Based on word similarity, this paper gives the definition of word set similarity. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance. At the same time, this paper presents a new method to create the vari-gram model.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 58-60)

Pages:

995-1000

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.58-60.995

Citation:

Cite this paper

Online since:

June 2011

Authors:

Li Chi Yuan

Keywords:

Statistical Language Model, Vari-Gram Language Model, Word Clustering

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

References

[1] Takuya Matsuzaki, Yusuke Miyao. An Efficient Clustering Algorithm for Class–Based Language Models[A]. Proc of the 7th Conf on Natural Language Learning at HLT-NAACL[C]. 2003. 119-126.

DOI: 10.3115/1119176.1119192

Google Scholar

[2] Ido Dagan et al. Context word similarity and estimation from sparse data. Computer Speech and Language, 1995, 9(2): 123-152.

DOI: 10.1006/csla.1995.0008

Google Scholar

[3] Niesler T R, Woodland P C. A variable-length category-based n-gram language model. In: Proce the International Conference of Acoustics Speech and Signal Processing. Atlanta, 1996, 164-169.

DOI: 10.1109/icassp.1996.540316

Google Scholar

[4] Firth, John Rupert. 1957. A synopsis of linguistic theory 1930-1955. In Philological Society, editor, Studies in Linguistic Analysis. Blackwell, Oxford, pages 1-32. Reprinted in Selected Papers of J. R. Firth, edited by F. Palmer. Longman, (1968).

DOI: 10.1093/ref:odnb/33138

Google Scholar

[5] Christopher D Manning, Hinrich Schutze. Foundations of Statistical Natural Language Processing. London: The MIT Press, (1999).

Google Scholar

[6] Cutting, D. R., Karger, D. R., Perdersen, J. R, and Tukey, J. W. Scatter/garther: A cluster-based approach to browsing large document collections. In SIGIR 92.

DOI: 10.1145/3130348.3130362

Google Scholar

[7] Gao, J., Wang, H. F., M. and Lee, K. F. A unified approach to statistical language modeling for Chinese. ICASSP-2000, Istanbul, Turkey, June.

Google Scholar

[8] Lee, Lillian. Similarity-Based approaches to Natural Language Processing. Ph.D. thesis, Harvard University, Cambridge, MA. (1997).

Google Scholar

[9] Karov, Yael and Shimon Edelman. 1996. Learning similarity-based word sense disambiguation from sparse data. In Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen, Denmark, 42-55.

Google Scholar