浅读——reducing the dimensionality of data with neural networks(一)

来源:互联网 发布:天宝软件 编辑:程序博客网 时间:2024/06/06 00:10

      最近在看DBN(深度信念网络)的相关知识以及代码,代码是Hition 的文章reducing the dimensionality of data with neural networks的代码部分。利用一点间歇时间把自己的学习体会总结一下,也不是太全。也结合了其他博主的文章,写一点自己的认识。也是刚刚学习不久,也会有些问题,望共同进步。RBM模型在原理上的实现和代码实现还是有所不同。其基本框架如下

训练时采用分批训练:训练数据一共60000个,分成600批,每批100个样本。data是训练样本;s是sigmoid函数。根据上面这个框架思路来理解以下代码应该难度不大。

rbm.m代码解读

% Version 1.000

%

% Code provided by GeoffHinton and Ruslan Salakhutdinov

%

% Permission is granted foranyone to copy, use, modify, or distribute this

% program and accompanyingprograms and documents for any purpose, provided

% this copyright notice isretained and prominently displayed, along with

% a note saying that theoriginal programs are available from our

% web page.

% The programs and documentsare distributed without any warranty, express or

% implied.  As the programs were written for researchpurposes only, they have

% not been tested to thedegree that would be advisable in any important

% application.  All use of these programs is entirely at theuser's own risk.

 

% This program trainsRestricted Boltzmann Machine in which

% visible, binary, stochasticpixels are connected to

% hidden, binary, stochasticfeature detectors using symmetrically

% weighted connections.Learning is done with 1-step Contrastive Divergence.  

% The program assumes thatthe following variables are set externally:

% maxepoch  -- maximum number of epochs

% numhid    -- number of hidden units

% batchdata -- the data thatis divided into batches (numcases numdims numbatches)

% restart   -- set to 1 if learning starts frombeginning

 

epsilonw      = 0.1;  % Learning rate for weights

epsilonvb     = 0.1;  % Learning rate for biases of visible units

epsilonhb     = 0.1;  % Learning rate for biases of hidden units

weightcost  = 0.0002;  

initialmomentum  = 0.5;

finalmomentum    = 0.9;%%momentum变量的作用:本次权值更新会保留一部分上次更新权值的增量值

[numcases numdimsnumbatches]=size(batchdata); %%100,784,600

 

if restart ==1,

 restart=0;

 epoch=1;

 

% Initializing symmetricweights and biases.

 vishid     = 0.1*randn(numdims,numhid);%%784*1000初始化(随机给的)可见层到隐含层的权重矩阵

 hidbiases  = zeros(1,numhid);%%初始化隐含层的偏差

 visbiases  = zeros(1,numdims);%%初始化可见层的偏差

 

 poshidprobs = zeros(numcases,numhid);%%100*1000单个batch正向传播时隐含层的输出概率

 neghidprobs = zeros(numcases,numhid);%% 反向隐含层的输出概率

 posprods    =zeros(numdims,numhid);%%784*1000 正向可见单元概率生成

 negprods    =zeros(numdims,numhid);%%反向可见单元概率生成

 vishidinc  =zeros(numdims,numhid);%%可见单元与隐藏单元之间权值增量

 hidbiasinc = zeros(1,numhid);%%隐含层偏差的增量

 visbiasinc = zeros(1,numdims);%%可见层偏差的增量

 batchposhidprobs=zeros(numcases,numhid,numbatches);%%100*1000*600 存储每次迭代计算好的每层的隐藏层概率,作为下一个RBM的输入

end

 

for epoch = epoch:maxepoch,%%总共迭代10次,开始迭代,进行pre-training

 fprintf(1,'epoch %d\r',epoch);

 errsum=0;%%初始化输出误差为0

 for batch = 1:numbatches,%%600每次处理一批次的数据

 fprintf(1,'epoch %d batch %d\r',epoch,batch);

 

%%%%%%%%% START POSITIVEPHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 data = batchdata(:,:,batch);%%784*100*600每次迭代提取每一批次的数据进行预训练,每一行代表一个样本值data28*28*100(每批100个样本)double型,并未二值化

 poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1)));%%通过s型函数计算隐含层输出概率值  

 batchposhidprobs(:,:,batch)=poshidprobs;%%将隐含层的结果作为下一层RBM的可见层

 posprods    = data' * poshidprobs;%%用于计算系统的能量值用的;矩阵中的每个元素表示对应的可视层节点和隐含层输出概率的乘积

 poshidact   = sum(poshidprobs);%%正向隐藏层输出概率求和;把每一列(共100列,即100个样本)隐含层的激活值累加起来

 posvisact = sum(data); %% 样本值求和;把每一列(100列,即100个样本)可见层的数据累加

 

%%%%%%%%% END OF POSITIVEPHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 poshidstates = poshidprobs > rand(numcases,numhid);%%%将隐含层输出概率二值化。大于随机概率的置1,小于随机概率的置0randm,n)产生m*n大小的矩阵,矩阵中元素为0-1之间的均匀分布

 

%%%%%%%%% START NEGATIVEPHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 negdata = 1./(1 + exp(-poshidstates*vishid' -repmat(visbiases,numcases,1)));%%反向进行时生成(重构)可视层的激活值

 neghidprobs = 1./(1 + exp(-negdata*vishid -repmat(hidbiases,numcases,1)));%% 由重构的可视层来再次生成隐藏层

 negprods  = negdata'*neghidprobs;%%用于计算能量值,隐藏层重构数据的期望矩阵输出

 neghidact = sum(neghidprobs); % 用重构的可见层的激活值来产生隐含层输出的概率值求和

 negvisact = sum(negdata);%%重构数据求和

 

%%%%%%%%% END OF NEGATIVEPHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 err= sum(sum( (data-negdata).^2 ));%%重构数据的误差

 errsum = err + errsum;%%整体误差

 

  if epoch>5,%%迭代次数不同,调整不同冲量

    momentum=finalmomentum;%%若大于5次,冲量为0.9

  else

    momentum=initialmomentum;%%若大于5次,冲量为0.5

  end;

 

%%%%%%%%% UPDATE WEIGHTS ANDBIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

   vishidinc = momentum*vishidinc + ...

                epsilonw*((posprods-negprods)/numcases - weightcost*vishid);%%权值增量计算

   visbiasinc = momentum*visbiasinc +(epsilonvb/numcases)*(posvisact-negvisact);%%偏置增量计算

   hidbiasinc = momentum*hidbiasinc +(epsilonhb/numcases)*(poshidact-neghidact);%%隐藏层增量计算

 

   vishid = vishid + vishidinc;%%更新参数;可见层到隐藏层的权重更新

   visbiases = visbiases + visbiasinc;%%可见层到隐藏层的偏置更新

   hidbiases = hidbiases + hidbiasinc;%%隐藏层的偏置更新

 

%%%%%%%%%%%%%%%% END OFUPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 

 end

 fprintf(1, 'epoch %4i error %6.1f \n', epoch, errsum);

end;

2 0
原创粉丝点击