Use document modeling to enhance PMF_1: CTR Model.

来源:互联网 发布:淘宝app能修改差评吗 编辑:程序博客网 时间:2024/06/06 01:23

Drawbacks of PMF

  1. Matrix factorization only uses information from other users, it cannot generalize to completely unrated items.(They cannot be used for recommending new products which have yet to receive rating information from any user)
  2. The prediction accuracy often drops significantly when the ratings are very sparse.
  3. The learnt latent space is not easy to interpret.(CTR Model can do this)

Use LDA to Enhance PMF

LDA

Documents are represented as random mixtures over latent topics, where each topic is characterized by a distribution over words. LDA assumes the following generative process for a corpus D consisting of M documents each of length Ni:
1. Choose \theta _{i},\sim ,\mathrm {Dir} (\alpha ), wherei\in {1,\dots ,M} and {\displaystyle \mathrm {Dir} (\alpha )}\mathrm {Dir} (\alpha ) is the Dirichlet distribution for parameter \alpha
2. Choose\varphi _{k},\sim ,\mathrm {Dir} (\beta ), where k\in {1,\dots ,K}
3. For each of the word positions i,j, where j, andi\in {1,\dots ,M}
(Note that the Multinomial distribution here refers to the Multinomial with only one trial. It is formally equivalent to the categorical distribution.)
4. Plate notation are as follows:
LDA

Categorical Distribution

K-dimensional categorical distribution is the most general distribution over a K-way event; any other discrete distribution over a size-K sample space is a special case. The parameters specifying the probabilities of each possible outcome are constrained only by the fact that each must be in the range 0 to 1, and all must sum to 1.

pmf p(x=i)=p_{i} p(x)=p_{1}^{[x=1]}\cdots p_{k}^{[x=k]} p(x)=[x=1]\cdot p_{1},+\dots +,[x=k]\cdot p_{k}

[x=i] is the Iverson bracket

Multinomial Distribution

pmf \frac{n!}{x_1!\cdots x_k!} p_1^{x_1} \cdots p_k^{x_k}

When n is 1 and k is 2 the multinomial distribution is the Bernoulli distribution.
When k is 2 and number of trials are more than 1 it is the Binomial distribution.
When n is 1 it is the categorical distribution.

Combine LDA into PMF: CTR

  1. For each item j,

    (a) Draw topic proportions θj ∼ Dirichlet(α).
    (b) Draw item latent offset jN(0;λ1IK) and set the item latent vector as vj =  ϵj+ θj.
    (c) For each word wjn,
    ​ i. Draw topic assignment zjn ∼ Mult(θ).
    ​ ii. Draw word wjn∼ Mult(βzjn).

  2. For each user-item pair(i;j), draw the rating

    rijN(uTivj;c1ij)

The key property in CTR lies in how the item latent vector vj is generated. Note that vj=j+θj, where jN(0;λ1vIk), is equivalent to vjN(θj;λ1vIK), where we assume the item latent vector vj is close to topic proportions θj, but could diverge from it if it has to. Note that the expectation of rij is a linear function of θj,

E[rij|ui,θj,ϵj]=uTi(θj+j)

This is why we call the model collaborative topic regression

  1. Plate Notation are as follows:
    CTR

Next: Use SDAE to enhance PMF //TODO

​ the latent representation learned by CTR may not be very effective when the auxiliary information is very sparse.

references

  1. C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific articles. In KDD, pages 448-456, 2011.
0 0
原创粉丝点击