Deep Learning by Andrew Ng --- self-taught

来源：互联网发布：sweetalert.min.js 编辑：程序博客网时间：2024/05/18 03:58

本次UFLDL练习大致流程：

通过对标记为5-9的数字图像进行self-taught特征提取（笔画特征），获得特征参数opttheta。
use opttheta to obtain a（2） which represente the labeled input data.
Training and testing the logistic regression model(with softmaxTrain.m which we have done previously).using the training set features (trainFeatures) and labels (trainLabels).
Classifying on the test set.completing the code to make predictions on the test set (testFeatures)

self-taught 和semi-supervised的区别：

两者都是通过对大量未标记的数据进行特征提取（例如使用autoencoder，得到W），然后再将标记的数据输入，得到representation — 与输入相对应的a。然后再将得到的a作为classifier的输入进行分类（如softmax regression）。
不同：
self-taught———Suppose your goal is a computer vision task where you’d like to distinguish between images of cars and images of motorcycles; so, each labeled example in your training set is either an image of a car or an image of a motorcycle. Where can we get lots of unlabeled data? The easiest way would be to obtain some random collection of images, perhaps downloaded off the internet. We could then train the autoencoder on this large collection of images, and obtain useful features from them. Because here the unlabeled data is drawn from a different distribution than the labeled data (i.e., perhaps some of our unlabeled images may contain cars/motorcycles, but not every image downloaded is either a car or a motorcycle), we call this self-taught learning.
semi-supervised———In contrast, if we happen to have lots of unlabeled images lying around that are all images of either a car or a motorcycle, but where the data is just missing its label (so you don’t know which ones are cars, and which ones are motorcycles), then we could use this form of unlabeled data to learn the features. This setting—where each unlabeled example is drawn from the same distribution as your labeled examples—is sometimes called the semi-supervised setting.

练习题答案（推荐自己完成后再参考）

stlExercise.m

%% CS294A/CS294W Self-taught Learning Exercise%  Instructions%  ------------% %  This file contains code that helps you get started on the%  self-taught learning. You will need to complete code in feedForwardAutoencoder.m%  You will also need to have implemented sparseAutoencoderCost.m and %  softmaxCost.m from previous exercises.%%% ======================================================================%  STEP 0: Here we provide the relevant parameters values that will%  allow your sparse autoencoder to get good filters; you do not need to %  change the parameters below.inputSize  = 28 * 28;numLabels  = 5;hiddenSize = 200;sparsityParam = 0.1; % desired average activation of the hidden units.                     % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",                     %  in the lecture notes). lambda = 3e-3;       % weight decay parameter       beta = 3;            % weight of sparsity penalty term   maxIter = 400;%% ======================================================================%  STEP 1: Load data from the MNIST database%%  This loads our training and test data from the MNIST database files.%  We have sorted the data for you in this so that you will not have to%  change it.% Load MNIST database filesmnistData   = loadMNISTImages('train-images-idx3-ubyte');mnistLabels = loadMNISTLabels('train-labels-idx1-ubyte');% Set Unlabeled Set (All Images)% Simulate a Labeled and Unlabeled setlabeledSet   = find(mnistLabels >= 0 & mnistLabels <= 4);unlabeledSet = find(mnistLabels >= 5);numTrain = round(numel(labeledSet)/2);trainSet = labeledSet(1:numTrain);testSet  = labeledSet(numTrain+1:end);unlabeledData = mnistData(:, unlabeledSet);trainData   = mnistData(:, trainSet);trainLabels = mnistLabels(trainSet)' + 1; % Shift Labels to the Range 1-5testData   = mnistData(:, testSet);testLabels = mnistLabels(testSet)' + 1;   % Shift Labels to the Range 1-5% Output Some Statisticsfprintf('# examples in unlabeled set: %d\n', size(unlabeledData, 2));fprintf('# examples in supervised training set: %d\n\n', size(trainData, 2));fprintf('# examples in supervised testing set: %d\n\n', size(testData, 2));%% ======================================================================%  STEP 2: Train the sparse autoencoder%  This trains the sparse autoencoder on the unlabeled training%  images. %  Randomly initialize the parameterstheta = initializeParameters(hiddenSize, inputSize);%% ----------------- YOUR CODE HERE ----------------------%  Find opttheta by running the sparse autoencoder on%  unlabeledTrainingImagesopttheta = theta; addpath minFunc/options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost                          % function. Generally, for minFunc to work, you                          % need a function pointer with two outputs: the                          % function value and the gradient. In our problem,                          % sparseAutoencoderCost.m satisfies this.options.maxIter = 400;    % Maximum number of iterations of L-BFGS to run options.display = 'on';[opttheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ...                                  inputSize, hiddenSize, ...                                   lambda, sparsityParam, ...                                   beta, unlabeledData), ...                              theta, options);%% -----------------------------------------------------% Visualize weightsW1 = reshape(opttheta(1:hiddenSize * inputSize), hiddenSize, inputSize);display_network(W1');%%======================================================================%% STEP 3: Extract Features from the Supervised Dataset%  %  You need to complete the code in feedForwardAutoencoder.m so that the %  following command will extract features from the data.trainFeatures = feedForwardAutoencoder(opttheta, hiddenSize, inputSize, ...                                       trainData);testFeatures = feedForwardAutoencoder(opttheta, hiddenSize, inputSize, ...                                       testData);%%======================================================================%% STEP 4: Train the softmax classifiersoftmaxModel = struct;  %% ----------------- YOUR CODE HERE ----------------------%  Use softmaxTrain.m from the previous exercise to train a multi-class%  classifier. %  Use lambda = 1e-4 for the weight regularization for softmax% You need to compute softmaxModel using softmaxTrain on trainFeatures and% trainLabelslambda = 1e-4;options.maxIter = 100;softmaxModel = softmaxTrain(hiddenSize, 5, lambda, ...                            trainFeatures,trainLabels, options);                        %注意tainFeatures的大小的hiddenSize%% -----------------------------------------------------%%======================================================================%% STEP 5: Testing %% ----------------- YOUR CODE HERE ----------------------% Compute Predictions on the test set (testFeatures) using softmaxPredict% and softmaxModel[pred] = softmaxPredict(softmaxModel, testFeatures);%% -----------------------------------------------------% Classification Scorefprintf('Test Accuracy: %f%%\n', 100*mean(pred(:) == testLabels(:)));% (note that we shift the labels by 1, so that digit 0 now corresponds to%  label 1)%% Accuracy is the proportion of correctly classified images% The results for our implementation was:%% Accuracy: 98.3%%%

feedForwardAutoencoder.m

function [activation] = feedForwardAutoencoder(theta, hiddenSize, visibleSize, data)% theta: trained weights from the autoencoder% visibleSize: the number of input units (probably 64) % hiddenSize: the number of hidden units (probably 25) % data: Our matrix containing the training data as columns.  So, data(:,i) is the i-th training example. % We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this % follows the notation convention of the lecture notes. W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);%% ---------- YOUR CODE HERE --------------------------------------%  Instructions: Compute the activation of the hidden layer for the Sparse Autoencoder.m=size(data,2);z2 = W1*data+repmat(b1,1,m);%注意这里一定要将b1向量复制扩展成m列的矩阵activation = sigmoid(z2);%-------------------------------------------------------------------end%-------------------------------------------------------------------% Here's an implementation of the sigmoid function, which you may find useful% in your computation of the costs and the gradients.  This inputs a (row or% column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). function sigm = sigmoid(x)    sigm = 1 ./ (1 + exp(-x));end

0 0