The Study of Generative Modeling of Text

Article Preview

Abstract:

Text mining is the task of automatic discovery of new, previously unknown information from unstructured document collections. Vector space or bag of words representation is one of the mainstream descriptions of text, in which each document is a data point in high-dimensional space and order between words is omitted. Generative models are probabilistic representation of data that can be regarded as the generator of observed data. Being probabilistic modelling approaches, a set of methods and criterions are available for model estimation, inference, comparison and selection for generative models. In this paper, we review several existing probabilistic models that are commonly applied to discrete exchangeable collections in English text. We hope this will shed some light on the Chinese text modelling and mining tasks.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1713-1717

Citation:

Online since:

October 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Van Rijsbergen C.J., Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, (1979).

Google Scholar

[2] Salton,G., &McGill M.J., Introduction to Modern Information Retrieval, McGraw Hill, (1983).

Google Scholar

[3] Bishop, C. M., Latent variable models. In M. I. Jordan (Ed. ), Learning in Graphical Models, MIT Press, 1999, p.371–403.

Google Scholar

[4] Arthur Dempster, Nan Laird, and Donald Rubin, Maximum likelihood from incomplete data via the EM algorithm,. Journal of the Royal Statistical Society, Series B, 39(1), 1977, p.1–38.

DOI: 10.1111/j.2517-6161.1977.tb01600.x

Google Scholar

[5] J. McLachlan and D. Peel, Finite Mixture Models, New York, John Wiley & Sons Ltd., (2000).

Google Scholar

[6] Thomas Hofmann, Probabilistic Latent Semantic Analysis, In Proc. of Uncertainty in Artificial Intelligence, UAI'9, (1999).

Google Scholar

[7] Andrew McCallum and Kamal Nigam, A comparison of event models for naive bayes text classification, In Proceeding of AAAI/ICML-98 Workshop on Learning for Text Categorization. AAAI Press, (1998).

Google Scholar

[8] David M. Blei and Andrew Y. Ng and Michael I. Jordan, Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993—1022. MIT Press, (2003).

Google Scholar

[9] D. Blei and J. Lafferty, Correlated topic models, In Advances in Neural Information Processing Systems 18, (2006).

Google Scholar