A Hierarchical Bayesian Model for Text Corpora

Peng Han; Ming Mei Chen; Ying Nan Zhang

doi:10.4028/www.scientific.net/AMM.687-691.1237

Paper Titles

Research on Mining and Classification of Public Opinion Mining Based on Semantic
p.1218

The Testimony and Application of Inequality Cauchy-Schwarz
p.1223

Research of Collision Detection Algorithm Based on Parallel Space Dividing
p.1227

Positive Solution of Nonlinear Two-Order Three-Point Boundary Value Problem for Difference Equation with Change of Sign
p.1232

A Hierarchical Bayesian Model for Text Corpora
p.1237

A Path Clearance Optimization Method Based on Retraction Algorithm for Motion Planning
p.1241

Evaluation Method Research Based on Fuzzy Gravity Center
p.1245

Research on Application of the Shortest Path in Game Based on an Improved Genetic Algorithm
p.1250

Research and Application of Data Mining(DM) in the Analysis of Postgraduate Admission Information
p.1254

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 687-691A Hierarchical Bayesian Model for Text Corpora

A Hierarchical Bayesian Model for Text Corpora

Abstract:

We propose a new generative probabilistic Dirich- let Author-Topic (DAT) Model for extracting information about authors and topics from large text collections. DAT is a three-level hierarchical Bayesian model. The model builds on the Author Topic (AT) model, adding the key attribute that distribution over author is conditioned on a Dirichlet prior. The probability distribution over topics in a multi-author document is a mixture of the distributions associated with the authors. The three level distributions including document-author, author-topic and topic-word are learned from data in an unsupervised manner using a Gibbs sampling algorithm. We give results on a large corpus which contains 1740 papers from the Neural Information Processing Systems Conference (NIPS). Experiments based on perplexity scores for test documents are used to illustrate systematic differences between the proposed model and a number of alternatives.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 687-691)

Pages:

1237-1240

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.687-691.1237

Citation:

Cite this paper

Online since:

November 2014

Authors:

Peng Han*, Ming Mei Chen, Ying Nan Zhang

Keywords:

Author Model, Gibbs Sampling, Perplexity, Topic Models

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] D. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation,. Journal of Machine Learning Research, pp.993-1022, (2003).

Google Scholar

[2] M. Rosen-Zvi, T. Griffiths, M. Steyvers & P. Smyth. The Author-Topic Model for Authors and Documents,. In 20th Conference on Uncertainty in Artificial Intelligence. Banff, Canada, (2004).

Google Scholar

[3] M. Rosen-Zvi, C. Chemudugunta, T. Griffiths, P. Smyth, &M. Steyvers. Learning author-topic models from text corpora,. ACM Transactions on Information Systems, 28(1), Article 4, (2008).

DOI: 10.1145/1658377.1658381

Google Scholar

[4] T. Hofmann. Probabilistic latent semantic indexing,. In Proceedings of the 22nd Annual ACM Conference on Research and Development in Information Retrieval, p.50–57, (1999).

DOI: 10.1145/312624.312649

Google Scholar

[5] D. Cohn and T. Hofmann. The missing link - a probabilistic model of document content and hypertext connectivity,. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems, 13, , pp.430-436, (2001).

Google Scholar

[6] E. Erosheva, S. Fienberg, and J. Lafferty. Mixedmembership models of scientific publications,. Proceedings of the National Academy of Sciences, pp.5220-5227, (2004).

DOI: 10.1073/pnas.0307760101

Google Scholar

[7] A. McCallum, X. Wang and A. Emmanuel. Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email,. Journal of Artificial Intelligence Research, (2007).

DOI: 10.1613/jair.2229

Google Scholar

[8] D. Mimno and A. McCallum. Expertise Modeling for Matching Papers with Reviewers,. Conference on Knowledge Discovery and Data Mining, (2007).

DOI: 10.1145/1281192.1281247

Google Scholar

[9] A. McCallum. Multi-Label Text Classiffication with a Mixture Model Trained by EM". AAAI, 99 Workshop on Text Learning, (1999).

Google Scholar

[10] T. L. Griffiths and M. Steyvers. Finding scientific topics,. Proceedings of the National Academy of Sciences of the United States of America, pp.5228-5235, (2004).

DOI: 10.1073/pnas.0307752101

Google Scholar

[11] T. Minka and J. Lafferty. Expectation-propagation for the generative aspect model,. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, pp.352-359, (2002).

Google Scholar

[12] W. Gilks, S. Richardson, and D. Spiegelhalter. Markov Chain Monte Carlo in Practice,. Chapman & Hall, New York, NY, (1996).

DOI: 10.1201/b14835

Google Scholar