Topical Word Embeddings

来源:互联网 发布:淘宝提升销量 编辑:程序博客网 时间:2024/05/21 18:39

论文 《 Topical Word Embeddings 》 记录

paper
code

Word Embedding面临的问题

homonymy and polysemy

解决homonymy and polysemy方法

multi-prototype: 对每个word赋予多个embedding

当前multi-prototype方法的缺点

1). These models generate multi-prototype vectors for each word in isolation, ignoring complicated correlations among words as well as their contexts. 说的很抽象
2). In multi-prototype setting, contexts of a word are divided into clusters with no overlaps. In reality, a word’s several senses may correlate with each other, and there is not clear semantic boundary between them.

解决上述缺点的方法(提出三个模型)

TWE

TWE三个模型的缺点

  • TWE-1: TWE-1 does not consider the immediate interaction between a word and its assigned topic for learning(单词和主题向量没有直接的交互)
  • TWE-2: TWE-2 considers the inner interaction of a word-topic pair by simply regarding the pair as a pseudo word, but it suffers from the sparsity issue because the occurrences of each word are rigidly discriminated into different topics.(假设单词在语料中出现N次, 每个主题下的单词平均只能学习到N/T次)
  • TWE-3: TWE-3 provides trade-off between discrimination and sparsity. But during the learning process of TWE-3, topic embeddings will influence the corresponding word embeddings, which may make those words in the same topic less discriminative.(T<<W)

训练细节

Initialization is important for learning TWE models. In TWE-1, we first learn word embeddings using Skip-Gram. Afterwards, we initialize each topic vector with the average over all words assigned to this topics, and learn topic embeddings while keeping word embeddings unchanged. In TWE-2, we initialize the vector of each topic-word pair with the corresponding word vector from Skip-Gram, and learn TWE models. In TWE-3, we initialize word vectors using those from Skip-Gram, and topic vectors using those from TWE-1, and learn TWE models.

Experiments

Contextual Word Similarity

考虑到每个单词只有在上下文的条件下才可以区分, 所以在评价multi-prototype模型的时候,采用Contextual Word Similarity任务,试验结果如下:
Contextual Word Similarity
个人总结: AvgSimC优于MaxSimC, 反映出单词之间的语义还是有交集的, 正如作者所说In reality, a word’s several senses may correlate with each other, and there is not clear semantic boundary between them;

Text Classification

macro-average and micro-average(precision, recall, F1-measure)

个人感觉只适用multi-class classification

二分类
Tables Positive negative True TP FN False FP TN

precision=P, recall=R

P=TPTP+FP

R=TPTP+FN

F1=2PRP+R

1F1=12(1P+1R)

多分类

n个二分类对应的precision和recall, 记为(P1,R1),(P2,R2),...,(Pn,Rn)

macroP=1ni=1nPi

microP=TP¯TP¯+FP¯

注明:调和平均更重视较小值, 因为
FP=2R2(P+R)2
FR=2P2(P+R)2

试验结果:
Text Classification

阅读全文
0 0