p.1218
p.1223
p.1227
p.1232
p.1237
p.1241
p.1245
p.1250
p.1254
A Hierarchical Bayesian Model for Text Corpora
Abstract:
We propose a new generative probabilistic Dirich- let Author-Topic (DAT) Model for extracting information about authors and topics from large text collections. DAT is a three-level hierarchical Bayesian model. The model builds on the Author Topic (AT) model, adding the key attribute that distribution over author is conditioned on a Dirichlet prior. The probability distribution over topics in a multi-author document is a mixture of the distributions associated with the authors. The three level distributions including document-author, author-topic and topic-word are learned from data in an unsupervised manner using a Gibbs sampling algorithm. We give results on a large corpus which contains 1740 papers from the Neural Information Processing Systems Conference (NIPS). Experiments based on perplexity scores for test documents are used to illustrate systematic differences between the proposed model and a number of alternatives.
Info:
Periodical:
Pages:
1237-1240
Citation:
Online since:
November 2014
Authors:
Keywords:
Price:
Сopyright:
© 2014 Trans Tech Publications Ltd. All Rights Reserved
Share:
Citation: