深度学习引论(三):损失函数、BP算法
来源:互联网 发布:免费网络加速器 编辑:程序博客网 时间:2024/05/01 18:26
损失函数(cost function)
经过上一节课的学习,我们已经对神经网络有了一定的概念。而评价一个神经网络的性能,往往使用损失函数来评价。
在有监督的学习中,每一次的训练,我们都已知目标输出,通常将神经网络真实的输出与已知目标输出的距离作为损失函数,两者相差越小,则认为神经网络性能越好,若两者相差很大,则认为神经网络性能不好。
设目标输出为
令e =
cost function : J =
cost function并不惟一
常用的还有:J =
一个网络有好的性能意味着找到了最适合的权值(
最速梯度法(Steepest Gradient Method)
(血崩。。。写到一半手贱去点开了另一篇博客的编辑,之前写好的这部分没保存,打开新的这里就直接没了,好气啊)
在之前的作业中,连接权w是直接给定了的,实际应用中也有根据经验来确定w的情况,但更多的是让神经网络自己去学习,直到找到最优的w为止。
梯度下降法让这个查找的过程始终沿着导数
公式:
举个简单的例子:(需要注意的是,实际情况中J和w的关系往往更复杂)
(这个图超级丑的,不过一看就是原创啊!哈哈哈)
图中的C点是我们要找的最优解,使用最速梯度法,令
当初始点在A点时,
当初始点在B点时,
Back Propagation 反向传播算法
- 前向计算
zl+1=wlal (1)al+1=f(zl+1) (2)
具体说明请参照上一篇博客 - 计算cost
J =12 (aL−yL)2 (3)
由第1步一直算到神经网络的最后一层,直到获得aL 。使用该公式来衡量神经网络的输出值与目标输出的差距,该公式不惟一。 - 反向计算
需计算∂J∂w ,按公式:wlji <——wlji -α ∂J∂w 更新参数,以保证调节参数的过程始终沿着w变化最快的方向。
根据导数链式法则 (3)–>(2)–>(1)
有∂J∂wL =∂J∂aL ·∂aL∂zL ·∂zL∂wL = (aL−yL ) · f`(zL ) ·aL 。 (4)
细心的你可能发现了,上面的公式计算的是最后一层的∂J∂wL ,接下来我们引入一个新的变量来帮助我们计算每一层的∂J∂wl 。
设δL = (aL−yL)·f‘(zL ) (即(4)式等号最右边但不包括aL )δl+1 和δl 具有如下关系:
综上所述∂J∂wl =δl+1 ·al 。 (5)
觉得上面的推导过程较复杂的可直接忽略,只需记住(1)(2)(3)(5)的公式,并可编程实现即可。
作业:下载地址
Instructions
Task 0: implement feedforward and backward computation
- in
fc.m
, implement the forward computing (in either component or vector form), return both the activation and the net input - in
bc.m
, implement the backward computing (in either component or vector form)
Task 1: implement online BP algorithm
in bp_online.m
:
1. calculate activations a1
, a2
, a3
, and net input z2
, z3
2. calculate cost function J
3. calculate sensitivity delta3
, delta2
4. calculate gradient with respect to weights dw1
, dw2
5. update weights w1
, w2
Task 2: implement batch BP algorithm
in bp_batch.m
:
1. calculate activations a1
, a2
, a3
, and net input z2
, z3
2. calculate cost function J
3. calculate sensitivity delta3
, delta2
4. cumulate gradient with respect to weights dw1
, dw2
5. update weights w1
, w2
fc.m
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Course: Understanding Deep Neural Networks%% Lab 3 - BP algorithms%% Task 0: implement feedforward and backward computation%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%function [a_next, z_next] = fc(w, a) % define the activation function f = @(s) 1 ./ (1 + exp(-s)); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Your code BELOW %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % forward computing (in either component or vector form) a = [a; 1]; z_next = w * a; a_next = f(z_next); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Your code ABOVE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%end
bc.m
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Course: Understanding Deep Neural Networks%% Lab 3 - BP algorithms%% Task 0: implement feedforward and backward computation%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%function delta = bc(w, z, delta_next) % define the activation function f = @(s) 1 ./ (1 + exp(-s)); % define the derivative of activation function df = @(s) f(s) .* (1 - f(s)); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Your code BELOW %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % backward computing (in either component or vector form) delta = df(z) * (sum(w * delta_next)); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Your code ABOVE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%end
bp_online.m
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Course: Understanding Deep Neural Networks%% Lab 3 - BP algorithms%% Task 1: implement online BP algorithm%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% clear the workspaceclear% define the activation functionf = @(s) 1 ./ (1 + exp(-s));% define the derivative of activation functiondf = @(s) f(s) .* (1 - f(s));% prepare the training data setdata = [1 0 0 1 0 1 0 1]; % sampleslabels = [1 1 0 0]; % labelsm = size(data, 2);% choose parameters, initialize the weightsalpha = 0.15;epochs = 50000;w1 = randn(2,3);w2 = randn(1,3);J = zeros(1,epochs);% loop until weights convergefor t = 1:epochs%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Your code BELOW%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for each samples for i = 1:m% forward calculation (invoke fc) a1 = data(:, i); [a2, z2] = fc(w1, a1); [a3, z3] = fc(w2, a2);% calculate cost function J(t) = 0.5 * (a3 - labels(i)) * (a3 - labels(i));% backwork calculation (invoke bc) delta3 = (a3 - labels(i)) * df(z3); delta2 = bc(w2, z2, delta3);% calculate the gradients dw1 = delta2 * ([a1;1])'; dw2 = delta3 * ([a2;1])';% update weights w1 = w1 - alpha * dw1; w2 = w2 - alpha * dw2;% end for each sample end%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Your code ABOVE%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% end loop if mod(t,100) == 0 fprintf('%i/%i epochs: J=%.4f\n', t, epochs, J(t)); endend% display the resultfor i = 1:4 a1 = data(:,i); [a2, z2] = fc(w1, a1); [a3, z3] = fc(w2, a2); fprintf('Sample [%i %i] (%i) is classified as %i.\n', data(1,i), data(2,i), labels(i), a3>0.5);end
bp_batch.m
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Course: Understanding Deep Neural Networks%% Lab 3 - BP algorithms%% Task 2: implement batch BP algorithm%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% clear the workspaceclear% define the activation functionf = @(s) 1 ./ (1 + exp(-s));% define the derivative of activation functiondf = @(s) f(s) .* (1 - f(s));% prepare the training data setdata = [1 0 0 1 0 1 0 1]; % sampleslabels = [1 1 0 0]; % labelsm = size(data, 2);% choose parameters, initialize the weightsalpha = 0.15;epochs = 50000;w1 = randn(2,3);w2 = randn(1,3);J = zeros(1,epochs);% loop until weights convergefor t = 1:epochs % reset the total gradients dw1 = 0; dw2 = 0;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Your code BELOW%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for all samples for i = 1:m% forward calculation (invoke fc) a1 = data(:, i); [a2, z2] = fc(w1, a1); [a3, z3] = fc(w2, a2);% calculate cost function J(t) = 0.25 * 0.5 * dot((a3 - labels(i)), (a3 - labels(i)));% backwork calculation (invoke bc) delta3 = (a3 - labels(i)) * df(z3); delta2 = bc(w2, z2, delta3);% cumulate the total gradients dw1 = dw1 + delta2 * ([a1;1])'; dw2 = dw2 + delta3 * ([a2;1])';% end for all samples end% update weights w1 = w1 - alpha * dw1; w2 = w2 - alpha * dw2;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Your code ABOVE%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% end loop if mod(t,100) == 0 fprintf('%i/%i epochs: J=%.4f\n', t, epochs, J(t)); endend% display the resultfor i = 1:4 a1 = data(:,i); [a2, z2] = fc(w1, [a1]); [a3, z3] = fc(w2, [a2]); fprintf('Sample [%i %i] (%i) is classified as %i.\n', data(1,i), data(2,i), labels(i), a3>0.5);end
- 深度学习引论(三):损失函数、BP算法
- 深度学习基础(三):激活函数和损失函数
- 深度学习笔记(三):激活函数和损失函数
- 深度学习caffe:损失函数
- tensorflow学习笔记(三):损失函数
- 深度学习BP算法 BackPropagation
- 【深度学习:CNN】BP算法
- 深度学习 2 : BP 算法
- 系统学习深度学习(八)--损失函数
- 深度学习笔记——TensorFlow学习笔记(二)激活函数、损失函数、优化算法和正则项
- 深度学习(一)——MP神经元模型, BP算法, 神经元激活函数, Dropout
- AI简单介绍(一)反向传播算法、损失函数、CNN、bp神经网络
- 【深度学习CV】SVM, Softmax损失函数
- 深度学习中的损失函数总结
- 深度学习引论(二):计算模型
- 深度学习(Deep Learning) 2.BP算法
- 深度学习中BP(Backpropagation)算法的工作流程
- 深度学习之后向传输(BP)算法
- 【使用Postman测试WEB接口】执行测试
- 集成NVM的超低功耗2.4GHz GFSK/FSK无线发射芯片SI24R2E
- Lucene的索引文件锁原理
- Groovy基础
- tf.control_dependencies与tf.identity组合详解
- 深度学习引论(三):损失函数、BP算法
- EastUi验证
- IC卡读写
- struts2-15自定义拦截器
- C# 调用 WebServices 接口
- Ubuntu发布重要更新将修复九个漏洞
- PMBOK笔记 第1章 引论 (2)
- CTS测试框架 -- 基础框架启动
- C# 对 Win Form 窗口按钮禁用