《textanalytics》课程简单总结(1):两种word relations——Paradigmatic vs. Syntagmatic(续)
来源:互联网 发布:spring面向切面编程 编辑:程序博客网 时间:2024/06/05 17:58
coursera上的公开课《https://www.coursera.org/course/textanalytics》系列,讲的非常不错哦。
3、挖掘Syntagmatic(组合)关系
问题定义:
解决该问题的关键是:the more random Xw is, the more difficult the prediction would be。
Entropy H(X) measures randomness of X:
High entropy,high randomness,harder to predict。
上面的问题专业一点问就是:Does presence of “eats” help predict the presence of “meat”? Does it reduce the uncertainty about “meat”, i.e., H(Xmeat)?=====》》》Conditional Entropy
Conditional Entropy for Mining Syntagmatic Relations of one word:
For each word W1
– For every other word W2, compute conditional entropy H(XW1|XW2)
– Sort all the candidate words in ascending order of H(XW1|XW2)
– Take the top-ranked candidate words as words that have potential syntagmatic relations with W1
使用条件熵有个问题:while H(XW1|XW2) and H(XW1|XW3) are comparable, H(XW1|XW2) and H(XW3|XW2) aren’t!(仅仅能挖掘对于W1而言,最常和他一起出现的词有哪些,而不能挖掘整个语料库中哪些词对<不一定有W1>最常出现。)
Mutual Information I(X;Y): Measure Entropy Reduction,mine the strongest K syntagmatic relations from a collection:
就是因为MI具有symmetric性:
Summary of Syntagmatic Relation Discovery :
• Syntagmatic relation can be discovered by measuring correlations between occurrences of two words.
• Three concepts from Information Theory:
– Entropy H(X): measures the uncertainty of a random variable X
– Conditional entropy H(X|Y): entropy of X given we know Y
– Mutual information I(X;Y): entropy reduction of X (or Y) due to knowing Y (or X)
• Mutual information provides a principled way for discovering syntagmatic relations
- 《textanalytics》课程简单总结(1):两种word relations——Paradigmatic vs. Syntagmatic
- 《textanalytics》课程简单总结(1):两种word relations——Paradigmatic vs. Syntagmatic(续)
- 《textanalytics》课程简单总结(4):课程总结
- Syntagmatic and Paradigmatic Relations 的代码调试
- 《textanalytics》课程简单总结(2):topic mining
- 《textanalytics》课程简单总结(3):text clustering
- ACM—课程总结
- 从vc、vs到Xcode4.6——“hello word”(使用xcode写c语言简单程序的基础篇)
- HDU 3231 Box Relations(拓扑排序)
- URAL 1142. Relations(dp啊)
- HDU3231 Box Relations(拓扑排序)经典
- HDU 3231 Box Relations(拓扑排序)
- Hdu 3231 Box Relations(拓扑排序)
- PMP培训课程第一场课后总结(1)——总体
- j2EE课程总结(Java课程学习总结)
- 290. Word Pattern-leetCode(字符模式匹配)(简单,使用两组Map进行查找判断)
- 数据库课程总结(ORACLE)
- 计算机网络课程总结(未完)
- ThinkPHP内置字符截取函数msubstr(…
- 如何在网页打开时弹出通知,JS代码
- ThinkPHP 在模板里使用大U方…
- 用百度site app快速搭建手机…
- PR值与网站权重的关系
- 《textanalytics》课程简单总结(1):两种word relations——Paradigmatic vs. Syntagmatic(续)
- 香港“占中”十问
- 如何修改服务器远程密码
- 远程桌面怎么上传本地资源 磁…
- 如何设置浏览QQ空间时的默认浏览器
- windows服务器您试图从目录中执行C…
- ecshop伪静态问题
- Dedecms当前栏目高亮完美处理办法…
- dedecms织梦验证码“图像因其本身有…