机器学习向量化练习

来源:互联网 发布:这位程序员四十岁了还 编辑:程序博客网 时间:2024/05/16 15:03

机器学习向量化练习

在先前的练习里面,我们已经通过对自然图像完成了一个稀疏自编码的练习。在这次我们将通过向量化来使我们运行速度更快,并且我们将把它应用到手写数字里面。

数据下载

  • MNIST Dataset (Training Images)
  • MNIST Dataset (Training Labels)
  • Support functions for loading MNIST in Matlab

第一步:向量化你的稀疏自编码

这一步我已经完成,具体可以看我上一次的博客。


第二步:学习手写数字的特征

1.我们先把train-images-idx3-ubyte.gz和mnistHelper这个两个文件先解压开,然后把文件放到我们上次的稀疏自编码的地方。

2.接下去我们就需要进入上次我们图片采样sampleIMAGES.m的地方,并把代码改成

function patches = sampleIMAGES()
% sampleIMAGES
% Returns 10000 patches for training


%load IMAGES;    % load images from disk 
%use mnist data


patchsize = 28;  % we'll use 8x8 patches 
numpatches = 10000;


% Initialize patches with zeros.  Your code will fill in this matrix--one
% column per patch, 10000 columns. 
patches = zeros(patchsize*patchsize, numpatches);


%%---------- YOUR CODE HERE --------------------------------------
%  Instructions: Fill in the variable called "patches" using data 
%  from IMAGES.  
%  
%  IMAGES is a 3D array containing 10 images
%  For instance, IMAGES(:,:,6) is a 512x512 array containing the 6th image,
%  and you can type "imagesc(IMAGES(:,:,6)), colormap gray;" to visualize
%  it. (The contrast on these images look a bit off because they have
%  been preprocessed using using "whitening."  See the lecture notes for
%  more details.) As a second example, IMAGES(21:30,21:30,1) is an image
%  patch corresponding to the pixels in the block (21,21) to (30,30) of
%  Image 1


%select 2000 patches from image1
%select 2000 patches from image2
%.....
% for k=1:4
%     for i=1:50
%         for j=1:50
%             patch=IMAGES(8*i-7:8*i,j*8-7:8*j,k);
%             patches(:,2500*(k-1)+50*(i-1)+j)=reshape(patch,64,1);
%         end
%     end
% end
images = loadMNISTImages('train-images-idx3-ubyte');
patches=images(:,1:10000);

然后接着再进去train.m文件

把模型的参数改成这样

visibleSize = 28*28;   % number of input units 
hiddenSize = 196;     % number of hidden units 
sparsityParam = 0.1;   % desired average activation of the hidden units.
                     % (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",
    %  in the lecture notes). 
lambda = 3e-3;     % weight decay parameter       
beta = 3;            % weight of sparsity penalty term       

%%---------------------------------------------------------------
% For the autoencoder to work well we need to normalize the data
% Specifically, since the output of the network is bounded between [0,1]
% (due to the sigmoid activation function), we have to make sure 
% the range of pixel values is also bounded between [0,1]
patches = normalizeData(patches);


end


%% ---------------------------------------------------------------
function patches = normalizeData(patches)


% Squash data to [0.1, 0.9] since we use sigmoid as the activation
% function in the output layer


% Remove DC (mean of images). 
patches = bsxfun(@minus, patches, mean(patches));


% Truncate to +/-3 standard deviations and scale to -1 to 1
pstd = 3 * std(patches(:));
patches = max(min(patches, pstd), -pstd) / pstd;


% Rescale from [-1,1] to [0.1,0.9]
patches = (patches + 1) * 0.4 + 0.1;


end

然后其他的参数不变。

接着我们就可以运行train了。


最后,在400次迭代后,你的稀疏自编码应该学会了笔画特征。换句话说,我们的程序将会学习图片里面的笔画的特征。我们在程序结束后可以看到这样一幅图


如果你的自编码是有问题的,那你可能得到以下的图


如果你的图片像这样,请你检查你的代码和参数。


1 0
原创粉丝点击