Word Vectors详解(1)
来源:互联网 发布:matlab求最优化问题 编辑:程序博客网 时间:2024/06/07 02:13
We want to represent a word with a vector in NLP. There are many methods.
1 one-hot Vector
Represent every word as an
2 SVD Based Methods
2.1 Window based Co-occurrence Matrix
Representing a word by means of its neighbors.
In this method we count the number of times each word appears inside a window of a particular size around the word of interest.
For example:
The matrix is too large. We should make it smaller with SVD.
- Generate
|V|∗|V| co-occurrence matrix,X . - Apply SVD on
X to getX=USVT . - Select the first
k columns ofU to get ak -dimensional word vectors. ∑ki=1σi∑|V|i=1σi indicates the amount of variance captured by the firstk dimensions.
2.2 shortage
SVD based methods do not scale well for big matrices and it is hard to incorporate new words or documents. Computational cost for a
3 Iteration Based Methods - Word2Vec
3.1 Language Models (Unigrams, Bigrams, etc.)
We need to create such a model that will assign a probability to a sequence of tokens.
For example
* The cat jumped over the puddle. —high probability
* Stock boil fish is toy. —low probability
Unigrams:
We can take the unary language model approach and break apart this probability by assuming the word occurrences are completely independent:
However, we know the next word is highly contingent upon the previous sequence of words. This model is bad.
Bigrams:
We let the probability of the sequence depend on the pairwise probability of a word in the sequence and the word next to it.
3.2 Continuous Bag of words Model (CBOW)
Example Sequence:
“The cat jumped over the puddle.”
What is Continuous Bag of words Model?
We treat {“the”, “cat” , “over”, “puddle”} as a context. And the word “jumped” is the center word. Context should be able to predict the center world. This type of model we call a Continuous Bag of words Model.
Known parameters:
If the index of center word is
The input of the model is the one-hot vector of context. We represent it with
And the outputs is the one-hot vector of center word.We represent it with
Parameters we need to learn:
Where
How does it work:
1. We get our embedded word vectors for the context:
2. Average these vectors:
3. Generate a score vector
4. Turn the scores into probabilities
5. We desire our probabilities generated
How to learn
learn them with stochastic gradient descent. So we need a loss function.
We use cross-entropy to measure the distance between two distributions:
Consider
We formulate our optimization objective as:
We use stochastic gradient descent to update
- Word Vectors详解(1)
- Word Vectors详解(2)
- 理解GloVe模型(Global vectors for word representation)
- [NLP论文阅读]Learned in Translation: Contextualized Word Vectors
- Educational Codeforces Round 1 C. Nearest vectors
- Educational Codeforces Round 1 C. Nearest vectors
- Educational Codeforces Round 1 C - Nearest vectors
- Educational Codeforces Round 1C. Nearest vectors
- 论文读书笔记-Alternate Equivalent Substitutes:Recognition of Synonyms Using Word Vectors
- 论文读书笔记-Alternate Equivalent Substitutes:Recognition of Synonyms Using Word Vectors
- Educational Codeforces Round 1 C.Nearest vectors(排序)
- DG Lecture 2 part 1: points, vectors, directional derivative
- Educational Codeforces Round 1-积角排序-C. Nearest vectors
- C++ Vectors
- vectors.S
- term vectors
- C++ Vectors
- C++ Vectors
- string实现
- Docker基础系列(二)一张图看懂Docker常用命令
- QT 010 Qt 4.2 在线手册含UML图解释 User's Guide Documentation
- ajax跨域请求
- eclipse编译cocos2d-x2.2.x移植到Android
- Word Vectors详解(1)
- PhotoShop中画圆角矩形最简单方法(图文并茂)!
- 计算机网络实验
- 最小化重绘和重排之缓存布局信息
- Android填坑之旅(第十二篇)由于Butterknife引发的血案
- 前端知识区别和学习路线_个人收藏
- Photoshop制作清晰的透明PNG图片的方法和技巧
- js 10元买酒
- Bootstrap 3.3.7学习笔记8