Love Hard.: [PaperReading] Latent Dirichlet allocation

Title: Latent Dirichlet allocation
Author: D. Blei, A. Ng, and M. Jordan
Year: Journal of Machine Learning Research, 3:993–1022, January 2003

LDA : three-level hierarchical bayesian model
which use to generate model for a corpus from training data.

we want to know 1) in each topic, the words' probability
   2) each document, the topic probability

first, what can we generate a new document from this trained corpus( get α , β)
1. guess N : the word count of this document  from some distribution (Poission)
2. guess θ : the topic distribution probability of this document ~ Dir(α) p.s α has known
3. guess word (total N word!)
   1) choose topic Zn from Multinomial(θ)
   2) from this topic, choose a word ~ p(w | Zn,β) - a multinomial probability conditioned on the topic Zn

multi-model: know each example and the probability of it

θ : (Pi~Pk)
the show-up probability of topic (each document has a distinctive theta, so this parameter can be used for distinguishing between documents)
my realization: θ describe the topic's probability of this document, high probability means this document belongs to it ---like, what is the probability that dice 1~6 show up

dirichlet parameter (alpha & beta):

α :

each topic's sample frequency ( like how many times dice 1~6 show up)

β :

a K x V matrix, beta(i,j): in topic i, word j 's probability

generate a document:

when we want to generate a document for this corpus,
first need alpha&beta ( related for corpora)
then,
alpha generate theta(topic's distribution), theta generate topic(which topics are this document belongs), and topic generate word(for each possible topic, the related words it will generate)
word generate document( combine this N word, we get a document )

exchangeability: assume each word in document appear independently
so,

[issue]
parameter estimation:
which alpha,beta has high probobility to generate this corpus
how to choose the topic number of this corpus?

Love Hard.

[PaperReading] Latent Dirichlet allocation

0 Comment(s):

[ Articles ]

[ Tags ]

[ About ]

[ Calendar ]

[ Archives ]

[ Comments ]