LDA参考资料
来源:互联网 发布:java邮件中添加超链接 编辑:程序博客网 时间:2024/06/03 14:38
一,LDA最基本的思想:Bayes Chain
LDA要做的事情就是将document投射到topic空间中
即做doc~topics的转换
而这种topic model,涉及到两种分布:
第一种就是topic~word的分布,就是p(w|z)。
第二种是p(z|d),这个是doc~topic分布
有了这两种分布后,这个文档集合就有了一种立体化的感觉,闭上眼睛,仔细地想:
doc
|
----------------------------------------
| | ... |
topic_1 topic_2 topic_m
而
topic_i
|
----------------------------------------
| | ... |
word_1 word_2 word_n
一个三层的文档表示空间跃然纸上。
二,LDA的特别之处
上面所说的这个Bayes Chain,就可以涵盖LDA的一个最基本的思想。
而PLSA其实也是这个链,那它和LDA有什么区别呢?
最大的区别就在于,doc~topic这一级,PLSA把这一级的所有变量都看作模型的参数,即有多少文档那么就有多少模型的参数;而LDA引入了一个超参数,对doc~topic这一个层级进行model。这样无论文档有多少,那么最外层模型显露出来的对于[doc~topic]就只有一个超参数。
三,LDA的
附录:
1. 参考:
2. 关于LDA的资料
LDA和HLDA:
(1)D. M. Blei, et al., "Latent Dirichlet allocation," Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.
(2)T. L. Griffiths and M. Steyvers, "Finding scientific topics," Proceedings of the National Academy of Sciences, vol. 101, pp. 5228-5235, 2004.
(3)D. M. Blei, et al., "Hierarchical Topic Models and the Nested Chinese Restaurant Process," NIPS, 2003.
(4)Blei的LDA视频教程:http://videolectures.net/mlss09uk_blei_tm/
(5)Teh的关于Dirichlet Processes的视频教程:http://videolectures.net/mlss07_teh_dp/
(6)Blei的毕业论文:http://www.cs.princeton.edu/~blei/papers/Blei2004.pdf
(7)Jordan的报告:http://www.icms.org.uk/downloads/mixtures/jordan_talk.pdf
(8)G. Heinrich, "Parameter Estimation for Text Analysis," http://www.arbylon.net/publications/text-est.pdf
基础知识:
(1)P. Johnson and M. Beverlin, “Beta Distribution,” http://pj.freefaculty.org/ps707/Distributions/Beta.pdf
(2)M. Beverlin and P. Johnson, “The Dirichlet Family,” http://pj.freefaculty.org/stat/Distributions/Dirichlet.pdf
(3)P. Johnson, “Conjugate Prior and Mixture Distributions”, http://pj.freefaculty.org/stat/TimeSeries/ConjugateDistributions.pdf
(4)P.J. Green, “Colouring and Breaking Sticks:Random Distributions and Heterogeneous Clustering”,http://www.maths.bris.ac.uk/~mapjg/papers/GreenCDP.pdf
(5)Y. W. Teh, "Dirichlet Process", http://www.gatsby.ucl.ac.uk/~ywteh/research/npbayes/dp.pdf
(6)Y. W. Teh and M. I. Jordan, "Hierarchical Bayesian Nonparametric Models with Applications,”
http://www.stat.berkeley.edu/tech-reports/770.pdf
(7)T. P. Minka, "Estimating a Dirichlet Distribution", http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/minka-dirichlet.pdf
(8)北邮论坛的LDA导读:[导读]文本处理、图像标注中的一篇重要论文Latent Dirichlet Allocation,http://bbs.byr.edu.cn/article/PR_AI/2530?p=1
(9)Zhou Li的LDA Note:http://lsa-lda.googlecode.com/files/Latent%20Dirichlet%20Allocation%20note.pdf
(10)C. M. Bishop, “Pattern Recognition And Machine Learning,” Springer, 2006.
代码:
(1)Blei的LDA代码(C):http://www.cs.princeton.edu/~blei/lda-c/index.html
(2)BLei的HLDA代码(C):http://www.cs.princeton.edu/~blei/downloads/hlda-c.tgz
(3)Gibbs LDA(C++):http://gibbslda.sourceforge.net/
(4)Delta LDA(Python):http://pages.cs.wisc.edu/~andrzeje/research/deltaLDA.tgz
(5)Griffiths和Steyvers的Topic Modeling工具箱:http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm
(6)LDA(Java):http://www.arbylon.net/projects/
(7)Mochihashi的LDA(C,Matlab):http://chasen.org/~daiti-m/dist/lda/
(8)Chua的LDA(C#):http://www.mysmu.edu/phdis2009/freddy.chua.2009/programs/lda.zip
(9)Chua的HLDA(C#):http://www.mysmu.edu/phdis2009/freddy.chua.2009/programs/hlda.zip
其他:
(1)S. Geman and D. Geman, "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. PAMI-6, pp. 721-741, 1984.
(2)B. C. Russell, et al., "Using Multiple Segmentations to Discover Objects and their Extent in Image Collections," in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, 2006, pp. 1605-1614.
(3)J. Sivic, et al., "Discovering objects and their location in images," in Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, 2005, pp. 370-377 Vol. 1.
(4)F. C. T. Chua, "Summarizing Amazon Reviews using Hierarchical Clustering," http://www.mysmu.edu/phdis2009/freddy.chua.2009/papers/amazon.pdf
(5)F. C. T. Chua, "Dimensionality Reduction and Clustering of Text Documents,”http://www.mysmu.edu/phdis2009/freddy.chua.2009/papers/probabilisticIR.pdf
(6)D Bacciu, "Probabilistic Generative Models for Machine Vision," http://www.math.unipd.it/~sperduti/AI09/bacciu_unipd_handouts.pdf
- LDA参考资料
- LDA(Linear discriminants analysis)参考资料
- 参考资料
- 参考资料
- 参考资料
- 参考资料
- 参考资料
- 参考资料
- 参考资料
- 参考资料
- 参考资料
- 参考资料
- 参考资料
- 参考资料
- 参考资料
- 参考资料
- 参考资料
- 参考资料
- POJ 3272 Monthly Expense 二分
- java多线程总结
- libsvm
- printf的使用技巧
- 台湾银行业将开展人民币清算业务
- LDA参考资料
- Android 打造自己的个性化应用(四):仿墨迹天气实现-->自定义扩展名的zip格式的皮肤
- 黑马程序员之WinForm编程基础学习笔记:输入宽和高,输出面积。
- Struts2请求处理流程及源码分析
- 伦敦自由行 转让
- SQL Server 备份时提示无法打开备份设备的错误解决办法
- PL/SQL学习七
- 黑马程序员之WinForm编程基础学习笔记:输入Email地址,输出用户名和域名。
- JQuery实例集合