attention model
来源:互联网 发布:大数据工程师工作强度 编辑:程序博客网 时间:2024/06/06 02:23
How did I select papers?
First, I tried to search for “attention” in CVPR2014-2016, ICCV2009-2015 and ACMMM2012-2015. However, there are only a few papers containing this key words.
Then, I searched for “attention model” in google, and found blogs where talk about it and list some papers.
Attention
[1] talks about “Attention” and why we need it: When people see a picture, usually they would move their eyes around over time and gather information about the scene. They don’t see every pixel of the image at once. They attend to certain aspects of the picture one time-step at a time and aggregate the information. That is exactly the kind of power we want to give to our neural network models. The usual convolutional network model does have the ability to be able to recognize cluttered images but how do we find the exact set of weights which are “good”? That is a difficult task. By providing the network with a new architecture-level feature which allows it to attend to different parts of image sequentially and aggregate information over time, we make that job easier, because now the network can simply learn to ignore the clutter (or so is the hope).
In natural language processing, there is a typical task: natural language generation, which is, given context, generate target(relevant sentence). For instance, machine translation. When using deep learning to solve this task, one common method is encoder-decoder framework.
Given one sequence of words, [3] uses one RNN encoder to generate a context vector(the last hidden state of RNN), then, one RNN decoder uses this hidden state as initial state to generate words one by one.
Fig 1
However, no matter how long the input sequence is, the output of the encoder is a one vector whose dimension is just several hundred, which means that the long the input sequence is, the more information of the state vector will lose.
In fact, decoder can use all information of the input sequence instead of just the last state.
Fig 2
In paper [2], when generate the Hypothesis states(h7,h8,h9), instead of the last state(h6), all the input vectors(h1,…,h5) will be inputed. Besides, not all the inputs vectors will influence the generation of the next state. For example, to translate “私は猫が好きです。” to “I like cats”. To generate the word ”like”, we should focus on the input word ”好き” instead of other words. “Attention” means to select proper input vectors and use them to generate next target state.
Soft-attention and Hard-attention
Papers
Translation
Effective approaches to attention-based neural machine translation[4]
The attention-based models of [4] are classified into two broad categories, global and local. These classes differ in terms of whether the “attention” is placed on all source positions or on only a few source positions.
Common to these two types of models is the fact that at each time step
Fig 3. Neural machine translation – a stacking recurrent architecture for translating a source sequence A B C D into a target sequence X Y Z. Here, marks the end of a sentence
Fig 4. Global attentional model – at each time step
Fig 5. Local attention model – the model first predicts a single aligned position
Neural machine translation by jointly learning to align and translate[5]
Reference
[1] http://stackoverflow.com/questions/35549588/soft-attention-vs-hard-attention
[2] Rocktäschel, T., Grefenstette, E., Hermann, K. M., Kočiský, T., & Blunsom, P. (2015). Reasoning about entailment with neural attention. arXiv preprint arXiv:1509.06664.
[3] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104-3112).
[4] Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
[5] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
[*] https://www.zhihu.com/question/36591394
Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., & Courville, A. (2015). Video description generation incorporating spatio-temporal features and a soft-attention mechanism. arXiv preprint arXiv:1502.08029.
- Attention Model
- attention model
- attention model
- Attention model
- attention model
- attention model
- Attention Model
- Attention Model简介
- attention model资料收集
- Attention Model 理解
- Spatial Attention model
- Attention Model in NLP
- Attention model 的一些概念
- 自然语言处理中的 Attention Model
- 自然语言处理中的Attention Model
- 【深度学习】Attention Model详解
- NLP、language model、lstm、attention model
- 讲attention model的一篇好文
- Java开发中的23种设计模式详解(转)
- [BZOJ 2741]【FOTILE模拟赛】L 分块+可持久trie树
- 【P3056】【USACO12NOV】笨牛Clumsy Cows
- java通过匹配合并数据(数据预处理)
- 用C语言实现一个简单的HTTP Client(HTTP客户端)
- attention model
- 9、正则表达式
- h5页面背景音乐不能自动播放的方案之一
- Java非空判断
- matlab错误:vl_feat工具箱问题
- OllyDbg 在可执行文件中无法定位数据(Unable to locate data in executable file)
- PAT (Advanced Level) 1100. Mars Numbers (20) 解题报告
- 网络流24题1.飞行员配对方案问题
- KVM网络虚拟化(一)