深度学习引论（三）：损失函数、BP算法

来源：互联网发布：免费网络加速器编辑：程序博客网时间：2024/05/01 18:26

损失函数（cost function）

经过上一节课的学习，我们已经对神经网络有了一定的概念。而评价一个神经网络的性能，往往使用损失函数来评价。

在有监督的学习中，每一次的训练，我们都已知目标输出，通常将神经网络真实的输出与已知目标输出的距离作为损失函数，两者相差越小，则认为神经网络性能越好，若两者相差很大，则认为神经网络性能不好。

设目标输出为yL，神经网络的输出为aL（两者都是n*1的列向量）
令e = yL−aL
cost function : J = 12∑nLj=1e2j

cost function并不惟一
常用的还有：J = 1m ∑mi=1L(yL,aL), 其中L(yL,aL)=aLlogyL+(1−aL)log(1−yL)

一个网络有好的性能意味着找到了最适合的权值（w1,w2,...,wL−1），使损失函数J最小。这个过程就是神经网络学习的过程，可使用最速梯度法查找。

最速梯度法（Steepest Gradient Method）

（血崩。。。写到一半手贱去点开了另一篇博客的编辑，之前写好的这部分没保存，打开新的这里就直接没了，好气啊）

在之前的作业中，连接权w是直接给定了的，实际应用中也有根据经验来确定w的情况，但更多的是让神经网络自己去学习，直到找到最优的w为止。

梯度下降法让这个查找的过程始终沿着导数∂J∂w，即w变化最快的方向去接近最优值。并设置学习率α，表示每一步变化的程度。

公式：wlji <—— wlji - α∂J∂w

举个简单的例子：（需要注意的是，实际情况中J和w的关系往往更复杂）
这里写图片描述
（这个图超级丑的，不过一看就是原创啊！哈哈哈）

图中的C点是我们要找的最优解，使用最速梯度法，令α = 1
当初始点在A点时，∂J∂w为负（A点的斜率），负负得正，根据公式，wlji应加上一个值，即A点往右边移动，直到到达C点或C点附近为止（与α的选取有关）。
当初始点在B点时，∂J∂w为负（B点的斜率），wlji应减去一个值，即B点往左边移动，直到到达C点或C点附近为止。

Back Propagation 反向传播算法

前向计算
zl+1=wlal (1)
al+1=f(zl+1) (2)
具体说明请参照上一篇博客
计算cost
J = 12(aL−yL)2 (3)
由第1步一直算到神经网络的最后一层，直到获得aL。使用该公式来衡量神经网络的输出值与目标输出的差距，该公式不惟一。
反向计算
需计算∂J∂w，按公式：wlji <—— wlji - α∂J∂w更新参数，以保证调节参数的过程始终沿着w变化最快的方向。
根据导数链式法则 (3)–>(2)–>(1)
有∂J∂wL = ∂J∂aL · ∂aL∂zL · ∂zL∂wL = (aL−yL) · f`(zL) · aL。 (4)
细心的你可能发现了，上面的公式计算的是最后一层的∂J∂wL，接下来我们引入一个新的变量来帮助我们计算每一层的∂J∂wl。
设δL = (aL−yL)·f‘(zL) （即(4)式等号最右边但不包括aL）
δl+1和δl具有如下关系:

综上所述
∂J∂wl = δl+1 · al。 (5)

觉得上面的推导过程较复杂的可直接忽略，只需记住(1)(2)(3)(5)的公式，并可编程实现即可。

作业：下载地址

Instructions

Task 0: implement feedforward and backward computation

in fc.m, implement the forward computing (in either component or vector form), return both the activation and the net input
in bc.m, implement the backward computing (in either component or vector form)

Task 1: implement online BP algorithm

in bp_online.m:
1. calculate activations a1, a2, a3, and net input z2, z3
2. calculate cost function J
3. calculate sensitivity delta3, delta2
4. calculate gradient with respect to weights dw1, dw2
5. update weights w1, w2

Task 2: implement batch BP algorithm

in bp_batch.m:
1. calculate activations a1, a2, a3, and net input z2, z3
2. calculate cost function J
3. calculate sensitivity delta3, delta2
4. cumulate gradient with respect to weights dw1, dw2
5. update weights w1, w2

fc.m

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Course:  Understanding Deep Neural Networks%% Lab 3 - BP algorithms%% Task 0: implement feedforward and backward computation%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%function [a_next, z_next] = fc(w, a)    % define the activation function    f = @(s) 1 ./ (1 + exp(-s));    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%    % Your code BELOW    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%    % forward computing (in either component or vector form)    a = [a; 1];    z_next = w * a;    a_next = f(z_next);    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%    % Your code ABOVE    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%end

bc.m

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Course:  Understanding Deep Neural Networks%% Lab 3 - BP algorithms%% Task 0: implement feedforward and backward computation%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%function delta = bc(w, z, delta_next)    % define the activation function    f = @(s) 1 ./ (1 + exp(-s));    % define the derivative of activation function    df = @(s) f(s) .* (1 - f(s));    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%    % Your code BELOW    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%    % backward computing (in either component or vector form)    delta = df(z) * (sum(w * delta_next));    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%    % Your code ABOVE    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%end

bp_online.m

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Course:  Understanding Deep Neural Networks%% Lab 3 - BP algorithms%% Task 1: implement online BP algorithm%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% clear the workspaceclear% define the activation functionf = @(s) 1 ./ (1 + exp(-s));% define the derivative of activation functiondf = @(s) f(s) .* (1 - f(s));% prepare the training data setdata   = [1 0 0 1          0 1 0 1]; % sampleslabels = [1 1 0 0]; % labelsm = size(data, 2);% choose parameters, initialize the weightsalpha = 0.15;epochs = 50000;w1 = randn(2,3);w2 = randn(1,3);J = zeros(1,epochs);% loop until weights convergefor t = 1:epochs%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Your code BELOW%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for each samples    for i = 1:m% forward calculation (invoke fc)        a1 = data(:, i);        [a2, z2] = fc(w1, a1);        [a3, z3] = fc(w2, a2);% calculate cost function        J(t) = 0.5 * (a3 - labels(i)) * (a3 - labels(i));% backwork calculation (invoke bc)        delta3 = (a3 - labels(i)) * df(z3);        delta2 = bc(w2, z2, delta3);% calculate the gradients        dw1 = delta2 * ([a1;1])';        dw2 = delta3 * ([a2;1])';% update weights        w1 = w1 - alpha * dw1;        w2 = w2 - alpha * dw2;% end for each sample    end%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Your code ABOVE%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% end loop    if mod(t,100) == 0        fprintf('%i/%i epochs: J=%.4f\n', t, epochs, J(t));    endend% display the resultfor i = 1:4    a1 = data(:,i);    [a2, z2] = fc(w1, a1);    [a3, z3] = fc(w2, a2);    fprintf('Sample [%i %i] (%i) is classified as %i.\n', data(1,i), data(2,i), labels(i), a3>0.5);end

bp_batch.m

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Course:  Understanding Deep Neural Networks%% Lab 3 - BP algorithms%% Task 2: implement batch BP algorithm%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% clear the workspaceclear% define the activation functionf = @(s) 1 ./ (1 + exp(-s));% define the derivative of activation functiondf = @(s) f(s) .* (1 - f(s));% prepare the training data setdata   = [1 0 0 1          0 1 0 1]; % sampleslabels = [1 1 0 0]; % labelsm = size(data, 2);% choose parameters, initialize the weightsalpha = 0.15;epochs = 50000;w1 = randn(2,3);w2 = randn(1,3);J = zeros(1,epochs);% loop until weights convergefor t = 1:epochs    % reset the total gradients    dw1 = 0;    dw2 = 0;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Your code BELOW%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for all samples    for i = 1:m% forward calculation (invoke fc)        a1 = data(:, i);        [a2, z2] = fc(w1, a1);        [a3, z3] = fc(w2, a2);% calculate cost function        J(t) = 0.25 * 0.5 * dot((a3 - labels(i)), (a3 - labels(i)));% backwork calculation (invoke bc)        delta3 = (a3 - labels(i)) * df(z3);        delta2 = bc(w2, z2, delta3);% cumulate the total gradients        dw1 = dw1 + delta2 * ([a1;1])';        dw2 = dw2 + delta3 * ([a2;1])';% end for all samples    end% update weights    w1 = w1 - alpha * dw1;    w2 = w2 - alpha * dw2;%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Your code ABOVE%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% end loop    if mod(t,100) == 0        fprintf('%i/%i epochs: J=%.4f\n', t, epochs, J(t));    endend% display the resultfor i = 1:4    a1 = data(:,i);    [a2, z2] = fc(w1, [a1]);    [a3, z3] = fc(w2, [a2]);    fprintf('Sample [%i %i] (%i) is classified as %i.\n', data(1,i), data(2,i), labels(i), a3>0.5);end

阅读全文

0 0