UFLFL Exercise: Learning color features with Sparse Autoencoders

来源：互联网发布：祝99什么意思网络用语编辑：程序博客网时间：2024/04/28 05:41
这是UFLDL线性解码器的练习题。
在以往练习的基础上修改sparseAutoencoderLinearCost.m。
sparseAutoencoderLinearCost.m
function [cost,grad] = sparseAutoencoderLinearCost(theta, visibleSize, hiddenSize, ...                                             lambda, sparsityParam, beta, data)% visibleSize: the number of input units (probably 64) % hiddenSize: the number of hidden units (probably 25) % lambda: weight decay parameter% sparsityParam: The desired average activation for the hidden units (denoted in the lecture%                           notes by the greek alphabet rho, which looks like a lower-case "p").% beta: weight of sparsity penalty term% data: Our 64x10000 matrix containing the training data.  So, data(:,i) is the i-th training example.   % The input theta is a vector (because minFunc expects the parameters to be a vector). % We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this % follows the notation convention of the lecture notes. % Set the initial value of W1,W2,b1,b2 to be a zero-neighboring matrixW1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize);b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end);% Cost and gradient variables (your code needs to compute these values). % Here, we initialize them to zeros. W1grad = zeros(size(W1)); W2grad = zeros(size(W2));b1grad = zeros(size(b1)); b2grad = zeros(size(b2));%% ---------- YOUR CODE HERE --------------------------------------%  Instructions: Compute the cost/optimization objective J_sparse(W,b) for the Sparse Autoencoder,%                and the corresponding gradients W1grad, W2grad, b1grad, b2grad.%% W1grad, W2grad, b1grad and b2grad should be computed using backpropagation.% Note that W1grad has the same dimensions as W1, b1grad has the same dimensions% as b1, etc.  Your code should set W1grad to be the partial derivative of J_sparse(W,b) with% respect to W1.  I.e., W1grad(i,j) should be the partial derivative of J_sparse(W,b) % with respect to the input parameter W1(i,j).  Thus, W1grad should be equal to the term % [(1/m) \Delta W^{(1)} + \lambda W^{(1)}] in the last block of pseudo-code in Section 2.2 % of the lecture notes (and similarly for W2grad, b1grad, b2grad).% % Stated differently, if we were using batch gradient descent to optimize the parameters,% the gradient descent update to W1 would be W1 := W1 - alpha * W1grad, and similarly for W2, b1, b2. % n_data = size(data,2);%1.foreward propagate to get activationshidden_activations = sigmoid(W1 * data + repmat(b1,1,n_data));out_activations = (W2 * hidden_activations + repmat(b2,1,n_data));%2.backward propagate to get residualout_residual = -(data-out_activations);%.*(out_activations.*(1-out_activations));avg_activations = sum(hidden_activations,2) ./ n_data;KL = beta*(-sparsityParam./avg_activations + (1-sparsityParam)./(1-avg_activations));KL = repmat(KL,1,n_data);hidden_residual = (W2'*out_residual+KL).*(hidden_activations.*(1-hidden_activations));%3.partial derivative and update daltaW deltabW2grad = W2grad + out_residual * hidden_activations';b2grad = b2grad + sum(out_residual,2);W1grad = W1grad + hidden_residual * data';b1grad = b1grad + sum(hidden_residual,2);W1grad = W1grad/n_data + lambda*W1;W2grad = W2grad/n_data + lambda*W2;b1grad = b1grad/n_data;b2grad = b2grad/n_data;%4.update W1,W1,b1,b2% alpha = 0.01;% W1 = W1 - alpha * W1grad;% W2 = W2 - alpha * W2grad;% b1 = b1 - alpha * b1grad;% b2 = b2 - alpha * b2grad;%5.calculate costcost = out_activations - data;cost = sum(cost(:).^2)/2/n_data + (lambda/2)*(sum(W1(:).^2) + sum(W2(:).^2)) + ...    beta*sum(sparsityParam .* log(sparsityParam./avg_activations(:,1)) + ...    (1-sparsityParam) .* log((1-sparsityParam)./(1-avg_activations(:,1))));%-------------------------------------------------------------------% After computing the cost and gradient, we will convert the gradients back% to a vector format (suitable for minFunc).  Specifically, we will unroll% your gradient matrices into a vector.grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];end%-------------------------------------------------------------------% Here's an implementation of the sigmoid function, which you may find useful% in your computation of the costs and the gradients.  This inputs a (row or% column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). function sigm = sigmoid(x)      sigm = 1 ./ (1 + exp(-x));end
结果：
0 0