Notes on MatConvNet(II):vl_simplenn
来源:互联网 发布:淘宝商盟怎么加入 编辑:程序博客网 时间:2024/06/16 17:05
Written before
This blog is the second one of the series of Notes on MatConvNet.
- Notes on MatConvNet(I) – Overview
Here I will mainly introduce the central of the matcovnet—vl_simplenn.
This function plays a quite important role in forward-propagation and backward-propagation.
PS. I have to admit that writing blogs in Chinese would bring much more traffic than writing English blogs. Yet, if I care too much about vain trifles, it is a sad thing.
Something that should be known before
I only introduce BP. How does the derivatives propagate backforward? I sincerely recommend you to read BP Algorithm. This blog is a good beginning of BP. When you have finished reading that, The first blog of this series is also recommended. I will repeat the main computation structure illustrated in Notes(I).
Computation Structure
I make some default rules here.
- y always represents the output of certain layer. That is to say when it comes to layer i, then y is the output of layer i.
- z always represents the output of the whole net, or rather, it represents the output of the final layer n.
- x represents the input of certain layer.
In order to make things more easier, matcovnet takes one simple function as a “layer”. This means when the input goes through a computation strcuture(no matter it is conv structure or just a relu structure), it does computation like the following:
Note:
- condition means that formula (2) only computes when the layer contains weights computation.
-
-
- It is a little different from BP Algorithm where it just takes conv or full connected layer as computation structures. However when you include activations(such as sigmoid, relu etc) and other functional structures(dropout, LRN, ect) into computation structures, you will find it quite easy. It is because every computation part only takes responsible for its own input and output. Every time, it gets an dzdy and its input x, it calculate dzdx using (1). If it represents conv or full connected layers or other layers which just take weights into computation, you have to do more computation with formula (2), for you need to get newer weights to update them. In fact it is where our goal is.
Taking a look at vl_simplenn
The result format
res(i+1).x
: the output of layeri
. Henceres(1).x
is the
network input.res(i+1).dzdx
: the derivative of the network output relative
to the output of layeri
. In particularres(1).dzdx
is the
derivative of the network output with respect to the network
input.res(i+1).dzdw
: a cell array containing the derivatives of the
network output relative to the parameters of layeri
. It can
be a cell array for multiple parameters.
Note: When it comes to the layer i, y means res(i+1).x . For y is the output of a certain layer, and res(i+1).x is indeed the input of layer i+1, so it shows in this type. And if the layer is i, res(i+1).dzdx
has the same meaning of dzdy
.
Main types you may use
res = vl_simplenn(net,x); (1)
res = vl_simplenn(net,x,dzdy); (2)
res = vl_simplenn(net, x,dzdy, res, opt, val) (3)
(1) is just forward computation.
(2) is used when back propagation. It is mainly used to compute the derivatives of input or the weights with respect the net’s output z
.
(3) is used in cnn_train
. It adds some opts but I do not introduce them here.
...% codes before 'Forward pass' is easy and deserves no explanations.% -------------------------------------------------------------------------% Forward pass% -------------------------------------------------------------------------for i=1:n if opts.skipForward, break; end; l = net.layers{i} ; res(i).time = tic ; switch l.type case 'conv' res(i+1).x = vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, ... 'pad', l.pad, ... 'stride', l.stride, ... l.opts{:}, ... cudnn{:}) ; case 'convt' res(i+1).x = vl_nnconvt(res(i).x, l.weights{1}, l.weights{2}, ... 'crop', l.crop, ... 'upsample', l.upsample, ... 'numGroups', l.numGroups, ... l.opts{:}, ... cudnn{:}) ; case 'pool' res(i+1).x = vl_nnpool(res(i).x, l.pool, ... 'pad', l.pad, 'stride', l.stride, ... 'method', l.method, ... l.opts{:}, ... cudnn{:}) ; case {'normalize', 'lrn'} res(i+1).x = vl_nnnormalize(res(i).x, l.param) ; case 'softmax' res(i+1).x = vl_nnsoftmax(res(i).x) ; case 'loss' res(i+1).x = vl_nnloss(res(i).x, l.class) ; case 'softmaxloss' res(i+1).x = vl_nnsoftmaxloss(res(i).x, l.class) ; case 'relu' if l.leak > 0, leak = {'leak', l.leak} ; else leak = {} ; end res(i+1).x = vl_nnrelu(res(i).x,[],leak{:}) ; case 'sigmoid' res(i+1).x = vl_nnsigmoid(res(i).x) ;... otherwise error('Unknown layer type ''%s''.', l.type) ; end
Codes above show the forward propagation’s main idea.
% optionally forget intermediate results forget = opts.conserveMemory & ~(doder & n >= backPropLim) ; if i > 1 lp = net.layers{i-1} ; % forget RELU input, even for BPROP % forget为是否保留中间结果res{i+1}.x, net.layers.precious % 为true则保留中间结果 forget = forget & (~doder | (strcmp(l.type, 'relu') & ~lp.precious)) ; forget = forget & ~(strcmp(lp.type, 'loss') || strcmp(lp.type, 'softmaxloss')) ; forget = forget & ~lp.precious ; end if forget %不保存就让这一层的输入置为空 res(i).x = [] ; end if gpuMode && opts.sync wait(gpuDevice) ; end res(i).time = toc(res(i).time) ;end
Backward pass
It seems no explanations are the best explanations. Because it is quite easy.
% -------------------------------------------------------------------------% Backward pass% -------------------------------------------------------------------------if doder res(n+1).dzdx = dzdy ; for i=n:-1:max(1, n-opts.backPropDepth+1) l = net.layers{i} ; res(i).backwardTime = tic ; switch l.type case 'conv' [res(i).dzdx, dzdw{1}, dzdw{2}] = ... vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx, ... 'pad', l.pad, ... 'stride', l.stride, ... l.opts{:}, ... cudnn{:}) ; case 'convt' [res(i).dzdx, dzdw{1}, dzdw{2}] = ... vl_nnconvt(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx, ... 'crop', l.crop, ... 'upsample', l.upsample, ... 'numGroups', l.numGroups, ... l.opts{:}, ... cudnn{:}) ; case 'pool' res(i).dzdx = vl_nnpool(res(i).x, l.pool, res(i+1).dzdx, ... 'pad', l.pad, 'stride', l.stride, ... 'method', l.method, ... l.opts{:}, ... cudnn{:}) ; case {'normalize', 'lrn'} res(i).dzdx = vl_nnnormalize(res(i).x, l.param, res(i+1).dzdx) ; case 'softmax' res(i).dzdx = vl_nnsoftmax(res(i).x, res(i+1).dzdx) ; case 'loss' res(i).dzdx = vl_nnloss(res(i).x, l.class, res(i+1).dzdx) ; case 'softmaxloss' res(i).dzdx = vl_nnsoftmaxloss(res(i).x, l.class, res(i+1).dzdx) ; case 'relu' if l.leak > 0, leak = {'leak', l.leak} ; else leak = {} ; end if ~isempty(res(i).x) res(i).dzdx = vl_nnrelu(res(i).x, res(i+1).dzdx, leak{:}) ; else % if res(i).x is empty, it has been optimized away, so we use this % hack (which works only for ReLU): res(i).dzdx = vl_nnrelu(res(i+1).x, res(i+1).dzdx, leak{:}) ; end case 'sigmoid' res(i).dzdx = vl_nnsigmoid(res(i).x, res(i+1).dzdx) ; case 'noffset' res(i).dzdx = vl_nnnoffset(res(i).x, l.param, res(i+1).dzdx) ; case 'spnorm' res(i).dzdx = vl_nnspnorm(res(i).x, l.param, res(i+1).dzdx) ; case 'dropout' if testMode res(i).dzdx = res(i+1).dzdx ; else res(i).dzdx = vl_nndropout(res(i).x, res(i+1).dzdx, ... 'mask', res(i+1).aux) ; end case 'bnorm' [res(i).dzdx, dzdw{1}, dzdw{2}, dzdw{3}] = ... vl_nnbnorm(res(i).x, l.weights{1}, l.weights{2}, res(i+1).dzdx) ; % multiply the moments update by the number of images in the batch % this is required to make the update additive for subbatches % and will eventually be normalized away dzdw{3} = dzdw{3} * size(res(i).x,4) ; case 'pdist' res(i).dzdx = vl_nnpdist(res(i).x, l.class, ... l.p, res(i+1).dzdx, ... 'noRoot', l.noRoot, ... 'epsilon', l.epsilon, ... 'aggregate', l.aggregate) ; case 'custom' res(i) = l.backward(l, res(i), res(i+1)) ; end % layers switch l.type case {'conv', 'convt', 'bnorm'} if ~opts.accumulate res(i).dzdw = dzdw ; else for j=1:numel(dzdw) res(i).dzdw{j} = res(i).dzdw{j} + dzdw{j} ; end end dzdw = [] ; end if opts.conserveMemory && ~net.layers{i}.precious && i ~= n res(i+1).dzdx = [] ; res(i+1).x = [] ; end if gpuMode && opts.sync wait(gpuDevice) ; end res(i).backwardTime = toc(res(i).backwardTime) ; endend
Looking into functions direct under the folder matlab
Here I mean functions vl_nnxx. I will just take vl__nnsigmoid and vl__nnrelu for example.
function out = vl_nnsigmoid(x,dzdy)y = 1 ./ (1 + exp(-x));if nargin <= 1 || isempty(dzdy) out = y ;else out = dzdy .* (y .* (1 - y)) ;end
When it come to a layer, formula (1) is bound to be executed out = dzdy .* (y .* (1 - y)) ;
The latter part (y .* (1 - y))
is just
function y = vl_nnrelu(x,dzdy,varargin)opts.leak = 0 ;opts = vl_argparse(opts, varargin) ;if opts.leak == 0 if nargin <= 1 || isempty(dzdy) y = max(x, 0) ; else y = dzdy .* (x > 0) ; % here used formula (1) endelse if nargin <= 1 || isempty(dzdy) y = x .* (opts.leak + (1 - opts.leak) * (x > 0)) ; else y = dzdy .* (opts.leak + (1 - opts.leak) * (x > 0)) ; endend
Why I don’t show code of structures containing weights’ computation. Because they are coded in cuda C for computation speediness. You can find them in matlab\src. That’s why we should compile matconvnet at the very beginning. We need to compile functions like vl_nnconv into mex files to be called by matlab files.
The third of the series will be mainly introduce cnn_train
which is quite interesting.
- Notes on MatConvNet(II):vl_simplenn
- Notes on MatConvNet ( I ) --- Overview
- Essential Notes on Database(II) Relational Calculus 关系演算
- Korn Shell Notes(II)
- Perl Notes(II)
- Notes for Accounting II
- Machine Learning Notes II
- Notes on Nationalism
- Notes on the Way
- notes on rpmbuild
- SAP Notes on HPUX
- Notes on MongoDB
- Again, Notes on Alignment
- Notes on deduplication
- Notes on MongoDB
- Notes On Stable Fluids
- Notes on Cryptography
- Notes on Supermodel
- 字节流
- 多个类定义attr属性重复的问题:Attribute "xxx" has already been defined
- eclipse的文件同步插件
- oracle引入索引的目的
- 直方图内最大矩形 (最大矩形面积、贴海报、动态规划)
- Notes on MatConvNet(II):vl_simplenn
- 设计模式的理解
- java实现栈
- 浙大 PAT Advanced level 1001. A+B Format
- zero clipboard 使用
- TortoiseHg 学习笔记
- 常用的HQL语句
- 初等数论_6 2016.4.15
- MD5加密与byte[]数组与十六进制字符串与字符串的互相转换