RNN及其公式推导
来源:互联网 发布:api原油库存数据变化 编辑:程序博客网 时间:2024/06/10 01:34
RNN及其公式推导
RNN即循环神经网络,循环神经网络的种类有很多,本文主要是对基本的神经网络进行推导。一开始对推导很晕,在阅读了许多资料之后,整理如下。
结构
上图所示的是最基本的循环神经网络,但是这个图是抽象派的,只画了一个圈不代表只有一个隐层。如果把循环去掉,每一个都是一个全连接神经网络。
x 是一个向量,它表示输入层,s 也是一个向量,它表示隐藏层,这里需要注意,一层是有多个节点,节点数与s 的维度相同。U 是输入层到隐藏层的权重矩阵,与全连接网络的权重矩阵相同,o 也是一个向量,它表示输出层,V 是隐藏层到输出层的权重矩阵。
公式推导
RNN的BPTT算法关键就在于理解上图。前向算法很好理解,根据RNN的结构定义则很容易推出,下面是后向算法的推导。它的基本原理和BP算法一样的,同样包含了三个步骤:
1. 前向计算每个神经元的输出值
2. 方向计算每个神经元的误差项
3.计算每个权重的梯度,然后用优化算法更新
误差项计算
让我们回到第一张图的表示方法,用向量
误差项计算主要包括两部分,沿着两个方向传播,一个方向是其传递到上一层网络,得到
有关。
第一个方向,沿时间
第一项:
第二项
最后可求得:
第二个方向,和bp一致
因此
到此,解释了计算了完了误差项。
权值更新
最终的梯度就是每个时刻的梯度之和,证明在此处略过,可以参考这篇文章 ,本文只给出结论:
同理可以计算U
最后附上代码实现,希望能更好的理解
def rnn_step_forward(x, prev_h, Wx, Wh, b): """ Run the forward pass for a single timestep of a vanilla RNN that uses a tanh activation function. The input data has dimension D, the hidden state has dimension H, and we use a minibatch size of N. Inputs: - x: Input data for this timestep, of shape (N, D). - prev_h: Hidden state from previous timestep, of shape (N, H) - Wx: Weight matrix for input-to-hidden connections, of shape (D, H) - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H) - b: Biases of shape (H,) Returns a tuple of: - next_h: Next hidden state, of shape (N, H) - cache: Tuple of values needed for the backward pass. """ next_h, cache = None, None next_h = np.tanh(x.dot(Wx) + prev_h.dot(Wh) + b) cache = (x, Wx, Wh, prev_h, next_h) return next_h, cache
def rnn_step_backward(dnext_h, cache): """ Backward pass for a single timestep of a vanilla RNN. Inputs: - dnext_h: Gradient of loss with respect to next hidden state - cache: Cache object from the forward pass Returns a tuple of: - dx: Gradients of input data, of shape (N, D) - dprev_h: Gradients of previous hidden state, of shape (N, H) - dWx: Gradients of input-to-hidden weights, of shape (N, H) - dWh: Gradients of hidden-to-hidden weights, of shape (H, H) - db: Gradients of bias vector, of shape (H,) """ dx, dprev_h, dWx, dWh, db = None, None, None, None, None x, Wx, Wh, prev_h, next_h = cache dtanh = 1 - next_h ** 2 # (N, H) dx = (dnext_h * dtanh).dot(Wx.T) # (N, D) dprev_h = (dnext_h * dtanh).dot(Wh.T) # (N, H) dWx = x.T.dot(dnext_h * dtanh) # (D, H) dWh = prev_h.T.dot(dnext_h * dtanh) # (H, H) db = np.sum((dnext_h * dtanh), axis=0) return dx, dprev_h, dWx, dWh, db
def rnn_forward(x, h0, Wx, Wh, b): """ Run a vanilla RNN forward on an entire sequence of data. We assume an input sequence composed of T vectors, each of dimension D. The RNN uses a hidden size of H, and we work over a minibatch containing N sequences. After running the RNN forward, we return the hidden states for all timesteps. Inputs: - x: Input data for the entire timeseries, of shape (N, T, D). - h0: Initial hidden state, of shape (N, H) - Wx: Weight matrix for input-to-hidden connections, of shape (D, H) - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H) - b: Biases of shape (H,) Returns a tuple of: - h: Hidden states for the entire timeseries, of shape (N, T, H). - cache: Values needed in the backward pass """ h, cache = None, None N, T, D = x.shape _, H = h0.shape h = np.zeros((N, T, H)) h_interm = h0 cache = [] for i in xrange(T): h[:, i, :], cache_sub = rnn_step_forward(x[:, i, :], h_interm, Wx, Wh, b) h_interm = h[:, i, :] cache.append(cache_sub) return h, cache
def rnn_backward(dh, cache): """ Compute the backward pass for a vanilla RNN over an entire sequence of data. Inputs: - dh: Upstream gradients of all hidden states, of shape (N, T, H) Returns a tuple of: - dx: Gradient of inputs, of shape (N, T, D) - dh0: Gradient of initial hidden state, of shape (N, H) - dWx: Gradient of input-to-hidden weights, of shape (D, H) - dWh: Gradient of hidden-to-hidden weights, of shape (H, H) - db: Gradient of biases, of shape (H,) """ dx, dh0, dWx, dWh, db = None, None, None, None, None x, Wx, Wh, prev_h, next_h = cache[-1] _, D = x.shape N, T, H = dh.shape dx = np.zeros((N, T, D)) dh0 = np.zeros((N, H)) dWx = np.zeros((D, H)) dWh = np.zeros((H, H)) db = np.zeros(H) dprev_h_ = np.zeros((N, H)) for i in xrange(T - 1, -1, -1): dx_, dprev_h_, dWx_, dWh_, db_ = rnn_step_backward(dh[:, i, :] + dprev_h_, cache.pop()) dx[:, i, :] = dx_ dh0 = dprev_h_ dWx += dWx_ dWh += dWh_ db += db_ return dx, dh0, dWx, dWh, db
- RNN及其公式推导
- RNN公式推导
- RNN网络结构及公式推导
- 错排公式及其推导
- RNN(Recurrent Neural Networks)公式推导和实现
- 深度学习算法之CNN、RNN、LSTM公式推导
- 机器学习方法篇(6)------朴素RNN公式推导
- 最小二乘法拟合圆公式推导及其实现
- 公式推导
- RNN 神经网络数学推导
- 平方和立方和公式的推导及其拓展
- RNN和LSTM原理推导
- 常见RNN及其架构
- Catalan数公式推导
- 平方和公式推导
- 推导坐标旋转公式
- 点乘公式推导
- POJ1183 公式推导
- 根据两个分类变量按某一数值型变量计算频率
- Cadvisor-InfluxDB-Grafana监控实现
- poj(1852)
- 招银网络科技电面—C++研发
- Android--(8)--详解表格布局(TableLayout)
- RNN及其公式推导
- LoadRunner工作原理
- Android 线刷小白教程
- 【C语言版】CCF火车购票问题(201609-2)
- 【模板】最长递增子序列
- 生成p12自签名数字证书&使用p12证书为apk签名
- 面向对象三大特性
- 【codevs 3287】货车运输
- JSP中九大内置对象