Andrew NG 机器学习 练习4-Neural Networks Learning
来源:互联网 发布:视频一对一聊天软件 编辑:程序博客网 时间:2024/06/07 03:00
Introduction
我们将实现神经网络的反向传播算法,并将其应用到手写数字识别中。
1 神经网络
在以前的练习中,我们实现了 神经网络的前馈传播,并用我们提供的权重值,将其应用到了预测手写字体的任务中。在这个练习中,你讲实现后向传播算法来学习神经网络的参数。
1.1 可视化数据
每个训练样例,是一个20*20像素的图片灰度数值。每个像素通过一个浮点类型的值来表示灰度值。20*20像素的数值被展开成一个400*1的向量。每一个训练数据占数据矩阵X 的一行。所以X 为 5000*400 的矩阵。
训练数据的第二部分是一个 5000 维的向量 y ,他是 训练数据的标签。
为了适配Octave和MATLAB 的indexing , 由于没有 0 索引,所以用数字 10 表示 0 ,因此数字 0 被标记为 10,数字 1 到 9,被标记为 1 到 9.
%% Initializationclear ; close all; clc%% Setup the parameters you will use for this exerciseinput_layer_size = 400; % 20x20 Input Images of Digitshidden_layer_size = 25; % 25 hidden unitsnum_labels = 10; % 10 labels, from 1 to 10 % (note that we have mapped "0" to label 10)%% =========== Part 1: Loading and Visualizing Data =============% We start the exercise by first loading and visualizing the dataset. % You will be working with a dataset that contains handwritten digits.%% Load Training Datafprintf('Loading and Visualizing Data ...\n')load('ex4data1.mat');m = size(X, 1);% Randomly select 100 data points to displaysel = randperm(size(X, 1));sel = sel(1:100);displayData(X(sel, :));fprintf('Program paused. Press enter to continue.\n');pause;
1.2 模型表示
Our neural network is shown in Figure 2. It has 3 layers – an input layer,a hidden layer and an output layer. Recall that our inputs are pixel values of digit images.Since the images are of size 20 × 20, this gives us 400 input layer units (not counting the extra bias unit which always outputs +1). The training data will be loaded into the variables X and y by the ex4.m script.
You have been provided with a set of network parameters (Θ (1) ,Θ (2) ) already trained by us. These are stored in ex4weights.mat and will be loaded by ex4.m into Theta1 and Theta2. The parameters have dimensions that are sized for a neural network with 25 units in the second layer and 10 output units (corresponding to the 10 digit classes).
%% ================ Part 2: Loading Parameters ================% In this part of the exercise, we load some pre-initialized % neural network parameters.fprintf('\nLoading Saved Neural Network Parameters ...\n')% Load the weights into variables Theta1 and Theta2load('ex4weights.mat');% The matrices Theta1 and Theta2 will now be in your workspace% Theta1 has size 25 x 401% Theta2 has size 10 x 26% Unroll parameters nn_params = [Theta1(:) ; Theta2(:)];
1.3 Feedforward and cost function
Now you will implement the cost function and gradient for the neural network.First, complete the code in nnCostFunction.m to return the cost.
Recall that the cost function for the neural network (without regularization) is(不带正则化)
where
For example, if
You should implement the feedforward computation that computes
m等于5000,表示一共有5000个训练实例;K=10,总共有10种可能的训练结果(数字0-9)
假设函数
我们是通过如下公式来求解
同理,
由此可以看出:假设函数
举个例子:
它是含义是:使用神经网络 训练 training set 中的第6个训练实例,得到的训练结果是:以0.03的概率是 数字3,以0.97的概率是 数字5
(注意:向量的下标10 表示 数字0)
训练样本集的 结果向量 y (label of result)的解释
由于神经网络的训练是监督学习,也就是说:样本训练数据集是这样的格式:
因此,训练数据集(traing set)中的结果数据 y 是正确的已知的结果,比如
%% ================ Part 3: Compute Cost (Feedforward) ================% To the neural network, you should first start by implementing the% feedforward part of the neural network that returns the cost only. You% should complete the code in nnCostFunction.m to return cost. After% implementing the feedforward to compute the cost, you can verify that% your implementation is correct by verifying that you get the same cost% as us for the fixed debugging parameters.%% We suggest implementing the feedforward cost *without* regularization% first so that it will be easier for you to debug. Later, in part 4, you% will get to implement the regularized cost.%fprintf('\nFeedforward Using Neural Network ...\n')% Weight regularization parameter (we set this to 0 here).lambda = 0;J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, ... num_labels, X, y, lambda);fprintf(['Cost at parameters (loaded from ex4weights): %f '... '\n(this value should be about 0.287629)\n'], J);fprintf('\nProgram paused. Press enter to continue.\n');pause;
nnCostFunction.m
function [J grad] = nnCostFunction(nn_params, ... input_layer_size, ... hidden_layer_size, ... num_labels, ... X, y, lambda)%NNCOSTFUNCTION Implements the neural network cost function for a two layer%neural network which performs classification% [J grad] = NNCOSTFUNCTON(nn_params, hidden_layer_size, num_labels, ...% X, y, lambda) computes the cost and gradient of the neural network. The% parameters for the neural network are "unrolled" into the vector% nn_params and need to be converted back into the weight matrices. % % The returned parameter grad should be a "unrolled" vector of the% partial derivatives of the neural network.%% Reshape nn_params back into the parameters Theta1 and Theta2, the weight matrices% for our 2 layer neural networkTheta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ... hidden_layer_size, (input_layer_size + 1));Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ... num_labels, (hidden_layer_size + 1));% Setup some useful variablesm = size(X, 1);% You need to return the following variables correctly J = 0;Theta1_grad = zeros(size(Theta1));Theta2_grad = zeros(size(Theta2));% ====================== YOUR CODE HERE ======================% Instructions: You should complete the code by working through the% following parts.%% Part 1: Feedforward the neural network and return the cost in the% variable J. After implementing Part 1, you can verify that your% cost function computation is correct by verifying the cost% computed in ex4.ma1 = [ones(m, 1) X]; %5000*401 全部加上偏差单元1z2 = a1 * Theta1'; %5000*25a2 = sigmoid(z2); %5000*25a2 = [ones(m, 1) a2]; %5000*26 全部加上偏差单元1z3 = a2 * Theta2'; %5000*10h = sigmoid(z3); %5000*10%y 向量 5000*1yk = zeros(m, num_labels); %5000*10for i = 1:m yk(i, y(i)) = 1; %将y 分散到 yk 矩阵中,将相应位置的数据标为1endJ = (1/m)* sum(sum(((-yk) .* log(h) - (1 - yk) .* log(1 - h))));% =========================================================================% Unroll gradientsgrad = [Theta1_grad(:) ; Theta2_grad(:)];end
1.4 Regularized cost function
神经网络的代价函数(带正则化)是:
具体到本题,3个层神经网络,2个参数矩阵:
%% =============== Part 4: Implement Regularization ===============% Once your cost function implementation is correct, you should now% continue to implement the regularization with the cost.%fprintf('\nChecking Cost Function (w/ Regularization) ... \n')% Weight regularization parameter (we set this to 1 here).lambda = 1;J = nnCostFunction(nn_params, input_layer_size, hidden_layer_size, ... num_labels, X, y, lambda);fprintf(['Cost at parameters (loaded from ex4weights): %f '... '\n(this value should be about 0.383770)\n'], J);fprintf('Program paused. Press enter to continue.\n');pause;
nnCostFunction.m 加上正则化:
%加上正则化r = (lambda / (2 * m)) * (sum(sum(Theta1(:, 2:end) .^ 2))+ sum(sum(Theta2(:, 2:end) .^ 2)));J = J + r;
2 Backpropagation
BP算法是用来 计算神经网络的代价函数的梯度。
计算梯度,本质上是求偏导数。来求解偏导数我们可以用传统的数学方法:求偏导数的数学计算公式来求解。这也是Ng课程中讲到的“Gradient Checking”所用的方法。但当我们的输入特征非常的多(上百万…),参数矩阵Θ非常大时,就需要大量地进行计算了(Ng在课程中也专门提到,当实际训练神经网络时,要记得关闭 “Gradient Checking”)。而这也是神经网络大拿Minsky 曾经的一个观点:如果将计算层增加到两层,计算量则过大,而且没有有效的学习算法。所以,他认为研究更深层的网络是没有价值的。(可参考这篇博文:神经网络浅讲)
而BP算法,则解决了这个计算量过大的问题。BP算法又称反向传播算法,它从输出层开始,往输入层方向以“某种形式”的计算,得到一组“数据“,而这组数据刚好就是我们所需要的 梯度。
Once you have computed the gradient, you will be able to train the neural network by minimizing the cost function J(Θ) using an advanced optimizer such as fmincg.
2.1 Sigmoid gradient
Sigmoid函数的导数有一个特点,即Sigmoid的导数可以用Sigmoid函数自己本身来表示,如下:
证明过程如下:(将 证明过程中的 f(x) 视为 g(z) 即可 )
%% ================ Part 5: Sigmoid Gradient ================% Before you start implementing the neural network, you will first% implement the gradient for the sigmoid function. You should complete the% code in the sigmoidGradient.m file.%fprintf('\nEvaluating sigmoid gradient...\n')g = sigmoidGradient([-1 -0.5 0 0.5 1]);fprintf('Sigmoid gradient evaluated at [-1 -0.5 0 0.5 1]:\n ');fprintf('%f ', g);fprintf('\n\n');fprintf('Program paused. Press enter to continue.\n');pause;
sigmoidGradient.m
function g = sigmoidGradient(z)%SIGMOIDGRADIENT returns the gradient of the sigmoid function%evaluated at z% g = SIGMOIDGRADIENT(z) computes the gradient of the sigmoid function% evaluated at z. This should work regardless if z is a matrix or a% vector. In particular, if z is a vector or matrix, you should return% the gradient for each element.g = zeros(size(z));% ====================== YOUR CODE HERE ======================% Instructions: Compute the gradient of the sigmoid function evaluated at% each value of z (z can be a matrix, vector or scalar).g = sigmoid(z) .* (1 - sigmoid(z));% =============================================================end
2.2 Random initialization
为了避免 symmetry breaking(对称性破缺)
假设将参数矩阵
随机初始化参数矩阵,就是对参数矩阵Θ(L)中的每个元素,随机地赋值,取值范围一般为[ξ ,-ξ],ξ 的确定规则如下:
因此,随机初始化的好处就是:让学习 更有效率
随机初始化的Matlab实现如下:可以看出,它是先调用 randInitializeWeights.m 中定义的公式进行初始化的。然后,再将 initial_Theta1 和 initial_Theta2 unroll 成列向量
%% ================ Part 6: Initializing Pameters ================% In this part of the exercise, you will be starting to implment a two% layer neural network that classifies digits. You will start by% implementing a function to initialize the weights of the neural network% (randInitializeWeights.m)fprintf('\nInitializing Neural Network Parameters ...\n')initial_Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size);initial_Theta2 = randInitializeWeights(hidden_layer_size, num_labels);% Unroll parametersinitial_nn_params = [initial_Theta1(:) ; initial_Theta2(:)];
randInitializeWeights.m
function W = randInitializeWeights(L_in, L_out)%RANDINITIALIZEWEIGHTS Randomly initialize the weights of a layer with L_in%incoming connections and L_out outgoing connections% W = RANDINITIALIZEWEIGHTS(L_in, L_out) randomly initializes the weights % of a layer with L_in incoming connections and L_out outgoing % connections. %% Note that W should be set to a matrix of size(L_out, 1 + L_in) as% the first column of W handles the "bias" terms%% You need to return the following variables correctly W = zeros(L_out, 1 + L_in);% ====================== YOUR CODE HERE ======================% Instructions: Initialize W randomly so that we break the symmetry while% training the neural network.%% Note: The first column of W corresponds to the parameters for the bias unit%epsilon_init = 0.12;W = rand(L_out, 1 + L_in) * 2 * epsilon_init - epsilon_init;% =========================================================================end
2.3 Backpropagation
对于每一个训练实例(x, y),先用“前向传播算法”计算出 activations(
具体的每一层的残差计算公式如下:(本文中的神经网络只有3层,隐藏层的数目为1)
对于输出层的残差计算公式如下,这里的输入层是第1层,隐藏层是第2层,输出层是第3层
残差
上面的减法公式用向量表示为:
对于隐藏层的残差计算公式如下:
当每一层的残差计算好之后,就可以更新 Δ(delta) 矩阵了,Δ(delta) 矩阵与 参数矩阵有相同的维数,初始时Δ(delta) 矩阵中的元素全为0.
% nnCostFunction.mTheta1_grad = zeros(size(Theta1));% Theta1_grad is a 25*401 matrix--矩阵Θ(1) ,由Δ(1)的值来更新Theta2_grad = zeros(size(Theta2));% Theta2_grad is a 10*26 matrix--矩阵Θ(2) ,由Δ(2) 的值来更新
它的定义(计算公式)如下:
在这里,
计算出 Δ(delta) 矩阵后,就可以用它来更新 代价函数的导数了,公式如下:
一次完整的BP算法运行Matlab代码如下:
for row = 1:m a1 = [1 X(row,:)]'; z2 = Theta1 * a1; a2 = sigmoid(z2); a2 = [1; a2]; z3 = Theta2 * a2; a3 = sigmoid(z3); ykn=yk'; z2 = [1; z2]; delta3 = a3 - ykn(:, row); delta2 = (Theta2' * delta3) .* sigmoidGradient(z2); delta2 = delta2(2:end); Theta1_grad = Theta1_grad + delta2 * a1'; Theta2_grad = Theta2_grad + delta3 * a2';endTheta1_grad = Theta1_grad ./ m;Theta1_grad(:, 2:end) = Theta1_grad(:, 2:end) ... + (lambda/m) * Theta1(:, 2:end);%加上正则化Theta2_grad = Theta2_grad ./ m;Theta2_grad(:, 2:end) = Theta2_grad(:, 2:end) + ... + (lambda/m) * Theta2(:, 2:end);%加上正则化
2.4 Gradient checking
梯度检查的原理如下:由于我们通过BP算法这种巧妙的方式求得了代价函数的导数,那它到底正不正确呢?这里就可以用 高等数学 里面的导数的定义(极限的定义)来计算导数,然后再比较:用BP算法求得的导数 和 用导数的定义 求得的导数 这二者之间的差距。
数定义(极限定义)—非正式定义,如下:
可能正是这种通过定义直接计算的方式 运算量很大,所以课程视频中才提到:在正式训练时,要记得关闭 gradient checking
从下面的 gradient checking 结果可以看出(二者计算出来的结果几乎相等),故 BP算法的运行是正常的。
%% =============== Part 7: Implement Backpropagation ===============% Once your cost matches up with ours, you should proceed to implement the% backpropagation algorithm for the neural network. You should add to the% code you've written in nnCostFunction.m to return the partial% derivatives of the parameters.%fprintf('\nChecking Backpropagation... \n');% Check gradients by running checkNNGradientscheckNNGradients;fprintf('\nProgram paused. Press enter to continue.\n');pause;
checkNNGradients.m
function checkNNGradients(lambda)%CHECKNNGRADIENTS Creates a small neural network to check the%backpropagation gradients% CHECKNNGRADIENTS(lambda) Creates a small neural network to check the% backpropagation gradients, it will output the analytical gradients% produced by your backprop code and the numerical gradients (computed% using computeNumericalGradient). These two gradient computations should% result in very similar values.%if ~exist('lambda', 'var') || isempty(lambda) lambda = 0;endinput_layer_size = 3;hidden_layer_size = 5;num_labels = 3;m = 5;% We generate some 'random' test dataTheta1 = debugInitializeWeights(hidden_layer_size, input_layer_size);Theta2 = debugInitializeWeights(num_labels, hidden_layer_size);% Reusing debugInitializeWeights to generate XX = debugInitializeWeights(m, input_layer_size - 1);y = 1 + mod(1:m, num_labels)';% Unroll parametersnn_params = [Theta1(:) ; Theta2(:)];% Short hand for cost functioncostFunc = @(p) nnCostFunction(p, input_layer_size, hidden_layer_size, ... num_labels, X, y, lambda);[cost, grad] = costFunc(nn_params);numgrad = computeNumericalGradient(costFunc, nn_params);% Visually examine the two gradient computations. The two columns% you get should be very similar. disp([numgrad grad]);fprintf(['The above two columns you get should be very similar.\n' ... '(Left-Your Numerical Gradient, Right-Analytical Gradient)\n\n']);% Evaluate the norm of the difference between two solutions. % If you have a correct implementation, and assuming you used EPSILON = 0.0001 % in computeNumericalGradient.m, then diff below should be less than 1e-9diff = norm(numgrad-grad)/norm(numgrad+grad);fprintf(['If your backpropagation implementation is correct, then \n' ... 'the relative difference will be small (less than 1e-9). \n' ... '\nRelative Difference: %g\n'], diff);end
2.5 Regularized Neural Networks
对于神经网络而言,它的表达能力很强,容易出现 overfitting problem,故一般需要正则化。正则化就是加上一个正则化项,就可以了。注意 bias unit不需要正则化
%% =============== Part 8: Implement Regularization ===============% Once your backpropagation implementation is correct, you should now% continue to implement the regularization with the cost and gradient.%fprintf('\nChecking Backpropagation (w/ Regularization) ... \n')% Check gradients by running checkNNGradientslambda = 3;checkNNGradients(lambda);% Also output the costFunction debugging valuesdebug_J = nnCostFunction(nn_params, input_layer_size, ... hidden_layer_size, num_labels, X, y, lambda);fprintf(['\n\nCost at (fixed) debugging parameters (w/ lambda = %f): %f ' ... '\n(for lambda = 3, this value should be about 0.576051)\n\n'], lambda, debug_J);fprintf('Program paused. Press enter to continue.\n');pause;
在lambda==1,训练的迭代次数MaxIter==50的情况下,训练的结果如下:代价从一开始的3.29…到最后的 0.52….
训练集上的精度:Training Set Accuracy: 94.820000
Training Neural Network... Iteration 1 | Cost: 3.295180e+00Iteration 2 | Cost: 3.250966e+00Iteration 3 | Cost: 3.216955e+00Iteration 4 | Cost: 2.884544e+00Iteration 5 | Cost: 2.746602e+00Iteration 6 | Cost: 2.429900e+00...............Iteration 46 | Cost: 5.428769e-01Iteration 47 | Cost: 5.363841e-01Iteration 48 | Cost: 5.332370e-01Iteration 49 | Cost: 5.302586e-01Iteration 50 | Cost: 5.202410e-01
对于神经网络而言,很容易产生过拟合的现象,比如当把参数 lambda 设置成 0.1,并且训练次数MaxIter 设置成 200时,训练结果如下:训练精度已经达到了99.94%,很可能是 overfitting 了
Training Neural Network... Iteration 1 | Cost: 3.303119e+00Iteration 2 | Cost: 3.241696e+00Iteration 3 | Cost: 3.220572e+00Iteration 4 | Cost: 2.637648e+00Iteration 5 | Cost: 2.182911e+00...................Iteration 197 | Cost: 8.177972e-02Iteration 198 | Cost: 8.171843e-02Iteration 199 | Cost: 8.169971e-02Iteration 200 | Cost: 8.165209e-02Training Set Accuracy: 99.940000
2.6 Learning parameters using fmincg
使用Matlab的 fmincg 函数 最终得到 参数矩阵Θ
代码中最后一行 fmincg(costFunction, initial_nn_params, options) 将求得的神经网络的参数nn_params返回。initial_nn_params 就上前面提到的使用随机初始化后初始化的参数矩阵。
%% =================== Part 8: Training NN ===================% You have now implemented all the code necessary to train a neural % network. To train your neural network, we will now use "fmincg", which% is a function which works similarly to "fminunc". Recall that these% advanced optimizers are able to train our cost functions efficiently as% long as we provide them with the gradient computations.%fprintf('\nTraining Neural Network... \n')% After you have completed the assignment, change the MaxIter to a larger% value to see how more training helps.options = optimset('MaxIter', 50);% You should also try different values of lambdalambda = 1;% Create "short hand" for the cost function to be minimizedcostFunction = @(p) nnCostFunction(p, ... input_layer_size, ... hidden_layer_size, ... num_labels, X, y, lambda);% Now, costFunction is a function that takes in only one argument (the% neural network parameters)[nn_params, cost] = fmincg(costFunction, initial_nn_params, options);% Obtain Theta1 and Theta2 back from nn_paramsTheta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ... hidden_layer_size, (input_layer_size + 1));Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ... num_labels, (hidden_layer_size + 1));fprintf('Program paused. Press enter to continue.\n');pause;
2.7 Visualizing the hidden layer
多层神经网络有很多层,层数越多,有着更深的表示特征及更强的函数模拟能力。后一层网络是前一层的更“抽象”(更深入)的表示。
比如说本文中的识别0-9数字,这里的隐藏层只有一层(一般而言第一层称为输入层,最后一层称为输出层,其他中间的所有层称为隐藏层),隐藏层它学到的只是数字的一些边缘特征,输出层在隐藏层的基础上,就基本上能识别数字了。
隐藏层的可视化如下:
%% ================= Part 9: Visualize Weights =================% You can now "visualize" what the neural network is learning by % displaying the hidden units to see what features they are capturing in % the data.fprintf('\nVisualizing Neural Network... \n')displayData(Theta1(:, 2:end));fprintf('\nProgram paused. Press enter to continue.\n');pause;%% ================= Part 10: Implement Predict =================% After training the neural network, we would like to use it to predict% the labels. You will now implement the "predict" function to use the% neural network to predict the labels of the training set. This lets% you compute the training set accuracy.pred = predict(Theta1, Theta2, X);fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == y)) * 100);
参考:http://www.cnblogs.com/hapjin/p/6106182.html
- Andrew NG 机器学习 练习4-Neural Networks Learning
- Andrew Ng机器学习week5(Neural Networks: Learning)编程习题
- Andrew NG 机器学习 笔记-week5-神经网络的学习(Neural Networks:Learning)
- NG机器学习week5 Neural Networks: Learning
- Andrew NG 机器学习 练习3-Multiclass Classification and Neural Networks
- 斯坦福 机器学习Andrew NG 第四讲 Neural Networks representation
- Andrew Ng机器学习week4(Neural Networks: Representation)编程习题
- Neural Networks: Learning(Andrew ng ML)
- Andrew Ng Machine Learning 专题【Neural Networks】上
- Andrew Ng Machine Learning 专题【Neural Networks】下
- [ML of Andrew Ng]Week 5 Neural Networks:Learning
- Andrew Ng Neural-networks-deep-learning 课程笔记一
- #“Machine Learning”(Andrew Ng)#Week 4_1:Neural Networks(神经网络)
- Andrew NG机器学习课程笔记系列之——机器学习之神经网络模型-上(Neural Networks: Representation)
- Andrew NG机器学习课程笔记系列之——机器学习之神经网络模型-下(Neural Networks: Representation)
- 【机器学习】Machine Learning by Andrew Ng
- [ML of Andrew Ng]Week 4 Neural Networks: Representation
- Stanford 机器学习-Neural Networks learning
- 整理了部分数据分析用图表
- [BZOJ]1601 灌水 最小生成树
- python 字符串和编码
- poj2553 求汇点
- Mybatis 使用count
- Andrew NG 机器学习 练习4-Neural Networks Learning
- java图片色阶调整、亮度调整
- fail树(bzoj 3172: [Tjoi2013]单词)
- 简单三步搭建ss服务器和ss提速
- Python3 标准库概览
- 51nod 1065 最小正子段和
- BZOJ 1798 [Ahoi2009]Seq 维护序列seq 线段树模板
- 我所理解的微服务
- 安卓之旅