Notes on MatConvNet(II):vl_simplenn

来源:互联网 发布:淘宝商盟怎么加入 编辑:程序博客网 时间:2024/06/16 17:05

Written before

This blog is the second one of the series of Notes on MatConvNet.

  1. Notes on MatConvNet(I) – Overview
    Here I will mainly introduce the central of the matcovnet—vl_simplenn.
    This function plays a quite important role in forward-propagation and backward-propagation.
    PS. I have to admit that writing blogs in Chinese would bring much more traffic than writing English blogs. Yet, if I care too much about vain trifles, it is a sad thing.

Something that should be known before

I only introduce BP. How does the derivatives propagate backforward? I sincerely recommend you to read BP Algorithm. This blog is a good beginning of BP. When you have finished reading that, The first blog of this series is also recommended. I will repeat the main computation structure illustrated in Notes(I).

Computation Structure

这里写图片描述
I make some default rules here.

  • y always represents the output of certain layer. That is to say when it comes to layer i, then y is the output of layer i.
  • z always represents the output of the whole net, or rather, it represents the output of the final layer n.
  • x represents the input of certain layer.

In order to make things more easier, matcovnet takes one simple function as a “layer”. This means when the input goes through a computation strcuture(no matter it is conv structure or just a relu structure), it does computation like the following:

dzdx=f(x)dzdy(1)

dzdx=fw(x)dzdy(2) condition

Note:
- condition means that formula (2) only computes when the layer contains weights computation.
- f(x) means the derivative of the output with respect to the input x.
- fw(x) means the derivative of the output with respect to the weights w.
- It is a little different from BP Algorithm where it just takes conv or full connected layer as computation structures. However when you include activations(such as sigmoid, relu etc) and other functional structures(dropout, LRN, ect) into computation structures, you will find it quite easy. It is because every computation part only takes responsible for its own input and output. Every time, it gets an dzdy and its input x, it calculate dzdx using (1). If it represents conv or full connected layers or other layers which just take weights into computation, you have to do more computation with formula (2), for you need to get newer weights to update them. In fact it is where our goal is.

Taking a look at vl_simplenn

The result format

  • res(i+1).x: the output of layer i. Hence res(1).x is the
    network input.

  • res(i+1).dzdx: the derivative of the network output relative
    to the output of layer i. In particular res(1).dzdx is the
    derivative of the network output with respect to the network
    input.

  • res(i+1).dzdw: a cell array containing the derivatives of the
    network output relative to the parameters of layer i. It can
    be a cell array for multiple parameters.

Note: When it comes to the layer i, y means res(i+1).x . For y is the output of a certain layer, and res(i+1).x is indeed the input of layer i+1, so it shows in this type. And if the layer is i, res(i+1).dzdx has the same meaning of dzdy.

Main types you may use

res = vl_simplenn(net,x); (1)
res = vl_simplenn(net,x,dzdy); (2)
res = vl_simplenn(net, x,dzdy, res, opt, val) (3)
(1) is just forward computation.
(2) is used when back propagation. It is mainly used to compute the derivatives of input or the weights with respect the net’s output z.
(3) is used in cnn_train. It adds some opts but I do not introduce them here.

...% codes before 'Forward pass' is easy and deserves no explanations.% -------------------------------------------------------------------------%                                                              Forward pass% -------------------------------------------------------------------------for i=1:n  if opts.skipForward, break; end;  l = net.layers{i} ;  res(i).time = tic ;  switch l.type    case 'conv'      res(i+1).x = vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, ...        'pad', l.pad, ...        'stride', l.stride, ...        l.opts{:}, ...        cudnn{:}) ;    case 'convt'      res(i+1).x = vl_nnconvt(res(i).x, l.weights{1}, l.weights{2}, ...        'crop', l.crop, ...        'upsample', l.upsample, ...        'numGroups', l.numGroups, ...        l.opts{:}, ...        cudnn{:}) ;    case 'pool'      res(i+1).x = vl_nnpool(res(i).x, l.pool, ...        'pad', l.pad, 'stride', l.stride, ...        'method', l.method, ...        l.opts{:}, ...        cudnn{:}) ;    case {'normalize', 'lrn'}      res(i+1).x = vl_nnnormalize(res(i).x, l.param) ;    case 'softmax'      res(i+1).x = vl_nnsoftmax(res(i).x) ;    case 'loss'      res(i+1).x = vl_nnloss(res(i).x, l.class) ;    case 'softmaxloss'      res(i+1).x = vl_nnsoftmaxloss(res(i).x, l.class) ;    case 'relu'      if l.leak > 0, leak = {'leak', l.leak} ; else leak = {} ; end      res(i+1).x = vl_nnrelu(res(i).x,[],leak{:}) ;    case 'sigmoid'      res(i+1).x = vl_nnsigmoid(res(i).x) ;...    otherwise      error('Unknown layer type ''%s''.', l.type) ;  end

Codes above show the forward propagation’s main idea.

  % optionally forget intermediate results  forget = opts.conserveMemory & ~(doder & n >= backPropLim) ;  if i > 1    lp = net.layers{i-1} ;    % forget RELU input, even for BPROP    % forget为是否保留中间结果res{i+1}.x, net.layers.precious    % 为true则保留中间结果    forget = forget & (~doder | (strcmp(l.type, 'relu') & ~lp.precious)) ;    forget = forget & ~(strcmp(lp.type, 'loss') || strcmp(lp.type, 'softmaxloss')) ;    forget = forget & ~lp.precious ;  end  if forget  %不保存就让这一层的输入置为空    res(i).x = [] ;  end  if gpuMode && opts.sync    wait(gpuDevice) ;  end  res(i).time = toc(res(i).time) ;end

Backward pass

It seems no explanations are the best explanations. Because it is quite easy.

% -------------------------------------------------------------------------%                                                             Backward pass% -------------------------------------------------------------------------if doder  res(n+1).dzdx = dzdy ;  for i=n:-1:max(1, n-opts.backPropDepth+1)    l = net.layers{i} ;    res(i).backwardTime = tic ;    switch l.type      case 'conv'        [res(i).dzdx, dzdw{1}, dzdw{2}] = ...          vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx, ...          'pad', l.pad, ...          'stride', l.stride, ...          l.opts{:}, ...          cudnn{:}) ;      case 'convt'        [res(i).dzdx, dzdw{1}, dzdw{2}] = ...          vl_nnconvt(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx, ...          'crop', l.crop, ...          'upsample', l.upsample, ...          'numGroups', l.numGroups, ...          l.opts{:}, ...          cudnn{:}) ;      case 'pool'        res(i).dzdx = vl_nnpool(res(i).x, l.pool, res(i+1).dzdx, ...                                'pad', l.pad, 'stride', l.stride, ...                                'method', l.method, ...                                l.opts{:}, ...                                cudnn{:}) ;      case {'normalize', 'lrn'}        res(i).dzdx = vl_nnnormalize(res(i).x, l.param, res(i+1).dzdx) ;      case 'softmax'        res(i).dzdx = vl_nnsoftmax(res(i).x, res(i+1).dzdx) ;      case 'loss'        res(i).dzdx = vl_nnloss(res(i).x, l.class, res(i+1).dzdx) ;      case 'softmaxloss'        res(i).dzdx = vl_nnsoftmaxloss(res(i).x, l.class, res(i+1).dzdx) ;      case 'relu'        if l.leak > 0, leak = {'leak', l.leak} ; else leak = {} ; end        if ~isempty(res(i).x)          res(i).dzdx = vl_nnrelu(res(i).x, res(i+1).dzdx, leak{:}) ;        else          % if res(i).x is empty, it has been optimized away, so we use this          % hack (which works only for ReLU):          res(i).dzdx = vl_nnrelu(res(i+1).x, res(i+1).dzdx, leak{:}) ;        end      case 'sigmoid'        res(i).dzdx = vl_nnsigmoid(res(i).x, res(i+1).dzdx) ;      case 'noffset'        res(i).dzdx = vl_nnnoffset(res(i).x, l.param, res(i+1).dzdx) ;      case 'spnorm'        res(i).dzdx = vl_nnspnorm(res(i).x, l.param, res(i+1).dzdx) ;      case 'dropout'        if testMode          res(i).dzdx = res(i+1).dzdx ;        else          res(i).dzdx = vl_nndropout(res(i).x, res(i+1).dzdx, ...                                     'mask', res(i+1).aux) ;        end      case 'bnorm'        [res(i).dzdx, dzdw{1}, dzdw{2}, dzdw{3}] = ...          vl_nnbnorm(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx) ;        % multiply the moments update by the number of images in the batch        % this is required to make the update additive for subbatches        % and will eventually be normalized away        dzdw{3} = dzdw{3} * size(res(i).x,4) ;      case 'pdist'        res(i).dzdx = vl_nnpdist(res(i).x, l.class, ...          l.p, res(i+1).dzdx, ...          'noRoot', l.noRoot, ...          'epsilon', l.epsilon, ...          'aggregate', l.aggregate) ;      case 'custom'        res(i) = l.backward(l, res(i), res(i+1)) ;    end % layers    switch l.type               case {'conv', 'convt', 'bnorm'}        if ~opts.accumulate          res(i).dzdw = dzdw ;        else          for j=1:numel(dzdw)            res(i).dzdw{j} = res(i).dzdw{j} + dzdw{j} ;          end        end        dzdw = [] ;    end    if opts.conserveMemory && ~net.layers{i}.precious && i ~= n      res(i+1).dzdx = [] ;      res(i+1).x = [] ;    end    if gpuMode && opts.sync      wait(gpuDevice) ;    end    res(i).backwardTime = toc(res(i).backwardTime) ;  endend

Looking into functions direct under the folder matlab

Here I mean functions vl_nnxx. I will just take vl__nnsigmoid and vl__nnrelu for example.

function out = vl_nnsigmoid(x,dzdy)y = 1 ./ (1 + exp(-x));if nargin <= 1 || isempty(dzdy)  out = y ;else  out = dzdy .* (y .* (1 - y)) ;end

When it come to a layer, formula (1) is bound to be executed out = dzdy .* (y .* (1 - y)) ;The latter part (y .* (1 - y)) is just f(x).

function y = vl_nnrelu(x,dzdy,varargin)opts.leak = 0 ;opts = vl_argparse(opts, varargin) ;if opts.leak == 0  if nargin <= 1 || isempty(dzdy)    y = max(x, 0) ;  else    y = dzdy .* (x > 0) ;  % here used formula (1)  endelse  if nargin <= 1 || isempty(dzdy)    y = x .* (opts.leak + (1 - opts.leak) * (x > 0)) ;  else    y = dzdy .* (opts.leak + (1 - opts.leak) * (x > 0)) ;  endend

Why I don’t show code of structures containing weights’ computation. Because they are coded in cuda C for computation speediness. You can find them in matlab\src. That’s why we should compile matconvnet at the very beginning. We need to compile functions like vl_nnconv into mex files to be called by matlab files.

The third of the series will be mainly introduce cnn_train which is quite interesting.

0 0