深度神经网络
来源:互联网 发布:生产进销存软件免费版 编辑:程序博客网 时间:2024/05/17 00:01
本文的布局如下:首先是本文的基本思想,然后是结合实例的输入和输出,然后是每个函数所起的作用,之后是函数的具体实现,最后是备注
一、基本思想
1、输入样本数据,然后进行训练,然后进行测试
2、深度神经网络训练过程中:首先是进行初始化,根据需求设置神经网络的基本结构;然后进行前向传递(feedforward),层与层之间进行传递,求得误差;然后进行反向传播(back propogation),根据误差最小化原则,使用随机梯度下降法,对各个参数进行求导,确定下降方向,对各个参数进行更新(权重和偏置,该方法类似于单隐层前馈神经网络中的BP神经网络求解算法),在使用样本对神经网络进行训练的过程中,有一个小的case,即单个样本可以多次使用,原因在神经网络发生变化后,那么对该样本的学习能力就会不一样(有点类似于嚼甘蔗,或者说读书,一本书,比如平凡的世界这本书,在自己初中的时候看,在高三看,在复读的那一年看,在本科时候看,在工作的时候看,在研究生阶段看,不同的生命阶段看,总会有不同的体验的,惊喜地发现了这一点,一个深度神经网络也类似于一个正在成长的人,具有成长属性,像一个生命)
3、前向传递阶段:上一层的隐层输出做为本层的输入,具体的原理可参照BP神经网络的原理,如果有由于多层而造成的不同,则会另外进行补充
4、反向传播亦是如此
二、输入和输出
本文以MNIST手写数字识别为研究对象,输入的是10000幅像素为28*28的手写图片,输出的是图片所属的类别(1-10,这10个数字)
对于其他例子,分类问题,亦是如此
三、相关函数
1、function nn = nnsetup(architecture):神经网络的初始化,可以是一层,也可以是多层;返回一个神经网络结构
2、function [nn, L] = nntrain(nn, train_x, train_y, opts, val_x, val_y):神经网络的训练;返回一个神经网络,它更新了激励函数,误差,权重和偏置
3、function nn = nnff(nn, x, y):神经网络的前向传递;返回更新了层激活函数,误差和损失的神经网络结构
4、function nn = nnbp(nn):神经网络的反向传播;返回更新过权重的神经网络结构
5、function nn = nnapplygrads(nn):根据计算出来的参数的梯度对参数(权重和偏置)进行更新;返回更新过权重和偏置的神经网络结构
6、function [loss] = nneval(nn, loss, train_x, train_y, val_x, val_y):评估神经网络的性能;返回更新之后的损失结构体
四、函数具体实现
1、主方法
function test_example_NNload mnist_uint8;train_x = double(train_x) / 255;test_x = double(test_x) / 255;train_y = double(train_y);test_y = double(test_y);% normalize[train_x, mu, sigma] = zscore(train_x);test_x = normalize(test_x, mu, sigma);%% ex1 vanilla neural netrand('state',0)nn = nnsetup([784 100 10]);opts.numepochs = 1; % Number of full sweeps through dataopts.batchsize = 100; % Take a mean gradient step over this many samples[nn, L] = nntrain(nn, train_x, train_y, opts);[er, bad] = nntest(nn, test_x, test_y);assert(er < 0.08, 'Too big error');%% ex2 neural net with L2 weight decayrand('state',0)nn = nnsetup([784 100 10]);nn.weightPenaltyL2 = 1e-4; % L2 weight decayopts.numepochs = 1; % Number of full sweeps through dataopts.batchsize = 100; % Take a mean gradient step over this many samplesnn = nntrain(nn, train_x, train_y, opts);[er, bad] = nntest(nn, test_x, test_y);assert(er < 0.1, 'Too big error');%% ex3 neural net with dropoutrand('state',0)nn = nnsetup([784 100 10]);nn.dropoutFraction = 0.5; % Dropout fraction opts.numepochs = 1; % Number of full sweeps through dataopts.batchsize = 100; % Take a mean gradient step over this many samplesnn = nntrain(nn, train_x, train_y, opts);[er, bad] = nntest(nn, test_x, test_y);assert(er < 0.1, 'Too big error');%% ex4 neural net with sigmoid activation functionrand('state',0)nn = nnsetup([784 100 10]);nn.activation_function = 'sigm'; % Sigmoid activation functionnn.learningRate = 1; % Sigm require a lower learning rateopts.numepochs = 1; % Number of full sweeps through dataopts.batchsize = 100; % Take a mean gradient step over this many samplesnn = nntrain(nn, train_x, train_y, opts);[er, bad] = nntest(nn, test_x, test_y);assert(er < 0.1, 'Too big error');%% ex5 plotting functionalityrand('state',0)nn = nnsetup([784 20 10]);opts.numepochs = 5; % Number of full sweeps through datann.output = 'softmax'; % use softmax outputopts.batchsize = 1000; % Take a mean gradient step over this many samplesopts.plot = 1; % enable plottingnn = nntrain(nn, train_x, train_y, opts);[er, bad] = nntest(nn, test_x, test_y);assert(er < 0.1, 'Too big error');%% ex6 neural net with sigmoid activation and plotting of validation and training error% split training data into training and validation datavx = train_x(1:10000,:);tx = train_x(10001:end,:);vy = train_y(1:10000,:);ty = train_y(10001:end,:);rand('state',0)nn = nnsetup([784 20 10]); nn.output = 'softmax'; % use softmax outputopts.numepochs = 5; % Number of full sweeps through dataopts.batchsize = 1000; % Take a mean gradient step over this many samplesopts.plot = 1; % enable plottingnn = nntrain(nn, tx, ty, opts, vx, vy); % nntrain takes validation set as last two arguments (optionally)[er, bad] = nntest(nn, test_x, test_y);assert(er < 0.1, 'Too big error');
2、function nn = nnsetup(architecture)
%NNSETUP creates a Feedforward Backpropagate Neural Network% nn = nnsetup(architecture) returns an neural network structure with n=numel(architecture)% layers, architecture being a n x 1 vector of layer sizes e.g. [784 100 10] nn.size = architecture; nn.n = numel(nn.size); nn.activation_function = 'tanh_opt'; % Activation functions of hidden layers: 'sigm' (sigmoid) or 'tanh_opt' (optimal tanh). nn.learningRate = 2; % learning rate Note: typically needs to be lower when using 'sigm' activation function and non-normalized inputs. nn.momentum = 0.5; % Momentum nn.scaling_learningRate = 1; % Scaling factor for the learning rate (each epoch) nn.weightPenaltyL2 = 0; % L2 regularization nn.nonSparsityPenalty = 0; % Non sparsity penalty nn.sparsityTarget = 0.05; % Sparsity target nn.inputZeroMaskedFraction = 0; % Used for Denoising AutoEncoders nn.dropoutFraction = 0; % Dropout level (http://www.cs.toronto.edu/~hinton/absps/dropout.pdf) nn.testing = 0; % Internal variable. nntest sets this to one. nn.output = 'sigm'; % output unit 'sigm' (=logistic), 'softmax' and 'linear' for i = 2 : nn.n % weights and weight momentum nn.W{i - 1} = (rand(nn.size(i), nn.size(i - 1)+1) - 0.5) * 2 * 4 * sqrt(6 / (nn.size(i) + nn.size(i - 1))); nn.vW{i - 1} = zeros(size(nn.W{i - 1})); % average activations (for use with sparsity) nn.p{i} = zeros(1, nn.size(i)); endend
3、function [nn, L] = nntrain(nn, train_x, train_y, opts, val_x, val_y)
%NNTRAIN trains a neural net% [nn, L] = nnff(nn, x, y, opts) trains the neural network nn with input x and% output y for opts.numepochs epochs, with minibatches of size% opts.batchsize. Returns a neural network nn with updated activations,% errors, weights and biases, (nn.a, nn.e, nn.W, nn.b) and L, the sum% squared error for each training minibatch.assert(isfloat(train_x), 'train_x must be a float');assert(nargin == 4 || nargin == 6,'number ofinput arguments must be 4 or 6')loss.train.e = [];loss.train.e_frac = [];loss.val.e = [];loss.val.e_frac = [];opts.validation = 0;if nargin == 6 opts.validation = 1;endfhandle = [];if isfield(opts,'plot') && opts.plot == 1 fhandle = figure();endm = size(train_x, 1);batchsize = opts.batchsize;numepochs = opts.numepochs;numbatches = m / batchsize;assert(rem(numbatches, 1) == 0, 'numbatches must be a integer');L = zeros(numepochs*numbatches,1);n = 1;for i = 1 : numepochs tic; kk = randperm(m); for l = 1 : numbatches batch_x = train_x(kk((l - 1) * batchsize + 1 : l * batchsize), :); %Add noise to input (for use in denoising autoencoder) if(nn.inputZeroMaskedFraction ~= 0) batch_x = batch_x.*(rand(size(batch_x))>nn.inputZeroMaskedFraction); end batch_y = train_y(kk((l - 1) * batchsize + 1 : l * batchsize), :); nn = nnff(nn, batch_x, batch_y); nn = nnbp(nn); nn = nnapplygrads(nn); L(n) = nn.L; n = n + 1; end t = toc; if opts.validation == 1 loss = nneval(nn, loss, train_x, train_y, val_x, val_y); str_perf = sprintf('; Full-batch train mse = %f, val mse = %f', loss.train.e(end), loss.val.e(end)); else loss = nneval(nn, loss, train_x, train_y); str_perf = sprintf('; Full-batch train err = %f', loss.train.e(end)); end if ishandle(fhandle) nnupdatefigures(nn, fhandle, loss, opts, i); end disp(['epoch ' num2str(i) '/' num2str(opts.numepochs) '. Took ' num2str(t) ' seconds' '. Mini-batch mean squared error on training set is ' num2str(mean(L((n-numbatches):(n-1)))) str_perf]); nn.learningRate = nn.learningRate * nn.scaling_learningRate;endend
4、function nn = nnff(nn, x, y)
%NNFF performs a feedforward pass% nn = nnff(nn, x, y) returns an neural network structure with updated% layer activations, error and loss (nn.a, nn.e and nn.L) n = nn.n; m = size(x, 1); x = [ones(m,1) x]; nn.a{1} = x; %feedforward pass for i = 2 : n-1 switch nn.activation_function case 'sigm' % Calculate the unit's outputs (including the bias term) nn.a{i} = sigm(nn.a{i - 1} * nn.W{i - 1}'); case 'tanh_opt' nn.a{i} = tanh_opt(nn.a{i - 1} * nn.W{i - 1}'); end %dropout if(nn.dropoutFraction > 0) if(nn.testing) nn.a{i} = nn.a{i}.*(1 - nn.dropoutFraction); else nn.dropOutMask{i} = (rand(size(nn.a{i}))>nn.dropoutFraction); nn.a{i} = nn.a{i}.*nn.dropOutMask{i}; end end %calculate running exponential activations for use with sparsity if(nn.nonSparsityPenalty>0) nn.p{i} = 0.99 * nn.p{i} + 0.01 * mean(nn.a{i}, 1); end %Add the bias term nn.a{i} = [ones(m,1) nn.a{i}]; end switch nn.output case 'sigm' nn.a{n} = sigm(nn.a{n - 1} * nn.W{n - 1}'); case 'linear' nn.a{n} = nn.a{n - 1} * nn.W{n - 1}'; case 'softmax' nn.a{n} = nn.a{n - 1} * nn.W{n - 1}'; nn.a{n} = exp(bsxfun(@minus, nn.a{n}, max(nn.a{n},[],2))); nn.a{n} = bsxfun(@rdivide, nn.a{n}, sum(nn.a{n}, 2)); end %error and loss nn.e = y - nn.a{n}; switch nn.output case {'sigm', 'linear'} nn.L = 1/2 * sum(sum(nn.e .^ 2)) / m; case 'softmax' nn.L = -sum(sum(y .* log(nn.a{n}))) / m; endend
5、function nn = nnbp(nn)
%NNBP performs backpropagation% nn = nnbp(nn) returns an neural network structure with updated weights n = nn.n; sparsityError = 0; switch nn.output case 'sigm' d{n} = - nn.e .* (nn.a{n} .* (1 - nn.a{n})); case {'softmax','linear'} d{n} = - nn.e; end for i = (n - 1) : -1 : 2 % Derivative of the activation function switch nn.activation_function case 'sigm' d_act = nn.a{i} .* (1 - nn.a{i}); case 'tanh_opt' d_act = 1.7159 * 2/3 * (1 - 1/(1.7159)^2 * nn.a{i}.^2); end if(nn.nonSparsityPenalty>0) pi = repmat(nn.p{i}, size(nn.a{i}, 1), 1); sparsityError = [zeros(size(nn.a{i},1),1) nn.nonSparsityPenalty * (-nn.sparsityTarget ./ pi + (1 - nn.sparsityTarget) ./ (1 - pi))]; end % Backpropagate first derivatives if i+1==n % in this case in d{n} there is not the bias term to be removed d{i} = (d{i + 1} * nn.W{i} + sparsityError) .* d_act; % Bishop (5.56) else % in this case in d{i} the bias term has to be removed d{i} = (d{i + 1}(:,2:end) * nn.W{i} + sparsityError) .* d_act; end if(nn.dropoutFraction>0) d{i} = d{i} .* [ones(size(d{i},1),1) nn.dropOutMask{i}]; end end for i = 1 : (n - 1) if i+1==n nn.dW{i} = (d{i + 1}' * nn.a{i}) / size(d{i + 1}, 1); else nn.dW{i} = (d{i + 1}(:,2:end)' * nn.a{i}) / size(d{i + 1}, 1); end endend
6、function nn = nnapplygrads(nn)
%NNAPPLYGRADS updates weights and biases with calculated gradients% nn = nnapplygrads(nn) returns an neural network structure with updated% weights and biases for i = 1 : (nn.n - 1) if(nn.weightPenaltyL2>0) dW = nn.dW{i} + nn.weightPenaltyL2 * [zeros(size(nn.W{i},1),1) nn.W{i}(:,2:end)]; else dW = nn.dW{i}; end dW = nn.learningRate * dW; if(nn.momentum>0) nn.vW{i} = nn.momentum*nn.vW{i} + dW; dW = nn.vW{i}; end nn.W{i} = nn.W{i} - dW; endend
7、function [loss] = nneval(nn, loss, train_x, train_y, val_x, val_y)
%NNEVAL evaluates performance of neural network% Returns a updated loss structassert(nargin == 4 || nargin == 6, 'Wrong number of arguments');nn.testing = 1;% training performancenn = nnff(nn, train_x, train_y);loss.train.e(end + 1) = nn.L;% validation performanceif nargin == 6 nn = nnff(nn, val_x, val_y); loss.val.e(end + 1) = nn.L;endnn.testing = 0;%calc misclassification rate if softmaxif strcmp(nn.output,'softmax') [er_train, dummy] = nntest(nn, train_x, train_y); loss.train.e_frac(end+1) = er_train; if nargin == 6 [er_val, dummy] = nntest(nn, val_x, val_y); loss.val.e_frac(end+1) = er_val; endendend
8、function nnupdatefigures(nn,fhandle,L,opts,i)
%NNUPDATEFIGURES updates figures during trainingif i > 1 %dont plot first point, its only a point x_ax = 1:i; % create legend if opts.validation == 1 M = {'Training','Validation'}; else M = {'Training'}; end %create data for plots if strcmp(nn.output,'softmax') plot_x = x_ax'; plot_ye = L.train.e'; plot_yfrac = L.train.e_frac'; else plot_x = x_ax'; plot_ye = L.train.e'; end %add error on validation data if present if opts.validation == 1 plot_x = [plot_x, x_ax']; plot_ye = [plot_ye,L.val.e']; end %add classification error on validation data if present if opts.validation == 1 && strcmp(nn.output,'softmax') plot_yfrac = [plot_yfrac, L.val.e_frac']; end% plotting figure(fhandle); if strcmp(nn.output,'softmax') %also plot classification error p1 = subplot(1,2,1); plot(plot_x,plot_ye); xlabel('Number of epochs'); ylabel('Error');title('Error'); title('Error') legend(p1, M,'Location','NorthEast'); set(p1, 'Xlim',[0,opts.numepochs + 1]) p2 = subplot(1,2,2); plot(plot_x,plot_yfrac); xlabel('Number of epochs'); ylabel('Misclassification rate'); title('Misclassification rate') legend(p2, M,'Location','NorthEast'); set(p2, 'Xlim',[0,opts.numepochs + 1]) else p = plot(plot_x,plot_ye); xlabel('Number of epochs'); ylabel('Error');title('Error'); legend(p, M,'Location','NorthEast'); set(gca, 'Xlim',[0,opts.numepochs + 1]) end drawnow;endend
9、function [er, bad] = nntest(nn, x, y)
labels = nnpredict(nn, x); [dummy, expected] = max(y,[],2); bad = find(labels ~= expected); er = numel(bad) / size(x, 1);end
10、function nnchecknumgrad(nn, x, y)
epsilon = 1e-6; er = 1e-7; n = nn.n; for l = 1 : (n - 1) for i = 1 : size(nn.W{l}, 1) for j = 1 : size(nn.W{l}, 2) nn_m = nn; nn_p = nn; nn_m.W{l}(i, j) = nn.W{l}(i, j) - epsilon; nn_p.W{l}(i, j) = nn.W{l}(i, j) + epsilon; rand('state',0) nn_m = nnff(nn_m, x, y); rand('state',0) nn_p = nnff(nn_p, x, y); dW = (nn_p.L - nn_m.L) / (2 * epsilon); e = abs(dW - nn.dW{l}(i, j)); assert(e < er, 'numerical gradient checking failed'); end end endend
五、参考文献
https://github.com/rasmusbergpalm/DeepLearnToolbox
注:该深度学习工具箱主要是针对于matlab而言的,属于源代码级别的,相对于研究生而言,逻辑清晰比较易懂;但是在实际的工程应用中,多使用Python编程语言,并且也有许多大公司出产的相对更加健全的平台,如tensorflow,theano等;因此本文只是用来结合论文,来理解其基本思想,基础入门研究之用
- 深度神经网络
- 深度神经网络
- 深度神经网络
- 深度神经网络
- 深度神经网络
- 深度神经网络
- 深度神经网络
- 深度神经网络
- 神经网络到深度卷积神经网络
- CNN 深度神经网络
- kaldi中的深度神经网络
- 深度学习-神经网络1
- 神经网络&深度学习
- 深度神经网络主题报告
- 深度学习:神经网络
- 神经网络与深度学习
- 贪婪深度神经网络概述
- 深度学习-神经网络 历史
- HDU 1257 最少拦截系统
- BP神经网络
- Java30天笔记-XML
- P1720 月落乌啼算钱
- 输入字符串时的烦恼
- 深度神经网络
- 面经总结1——java集合框架
- Lesson02_C#基础_part04
- 文件命令行参数
- JSTL和EL表达式字符串比较
- 《农历算法》
- 限制玻尔兹曼机
- 用cocos2dx3.1的MenuItemToggle写一个会跳动音乐开关按钮
- 1999 ICPC WF C Morse Mismatches 【模拟】