[深度学习论文笔记][Recurrent Neural Networks] Visualizing and Understanding Recurrent Networks

来源:互联网 发布:龙卷风网络收音机 mac 编辑:程序博客网 时间:2024/05/29 13:12
Karpathy, Andrej, Justin Johnson, and Li Fei-Fei. “Visualizing and understanding recurrent networks” arXiv preprint arXiv:1506.02078 (2015). (Citations: 79).


1 RNN

RNN has form


Where W varies between layers but is shared through time. ⃗ x is the input from the layer below.


It was observed that the back-propagation dynamics caused the gradients in an RNN to either vanish or explode.


2 LSTM

The exploding gradient concern can be alleviated with a heuristic of clipping the gradients, and LSTMs were designed to mitigate the vanishing gradient problem. In addition to a ⃗ LSTMs also maintain a memory vector ⃗ c . At each time step the hidden state vector h, LSTM can choose to read from, write to, or reset the cell using explicit gating mechanisms.


The three vectors ⃗ i , ⃗ f , o ⃗ are thought of as binary gates that control whether each memory cell is updated, whether it is reset to zero, and whether its local state is revealed in

the hidden vector, respectively. The activations of these gates are based on the sigmoid function and hence allowed to range smoothly between zero and one to keep the model differentiable.


The vector g ⃗ ranges between -1 and 1 and is used to additively modify the memory contents. This additive interaction is a critical feature of the LSTM’s design, because during backpropagation a sum operation merely distributes gradients. This allows gradients on the memory cells ⃗ c to flow backwards through time uninterrupted for long time periods, or at least until the flow is disrupted with the multiplicative interaction of an active forget gate. See Fig. for an explanation.



3 GRU

This is a simple alternative to LSTM.


The Gated Recurrent Unit (GRU) has the interpretation of computing a candidate hidden vector h ⃗  ̃ t and then smoothly interpolating towards it gated by ⃗ z .

0 0