浅读——reducing the dimensionality of data with neural networks(一)

来源：互联网发布：天宝软件编辑：程序博客网时间：2024/06/06 00:10

最近在看DBN（深度信念网络）的相关知识以及代码，代码是Hition 的文章reducing the dimensionality of data with neural networks的代码部分。利用一点间歇时间把自己的学习体会总结一下，也不是太全。也结合了其他博主的文章，写一点自己的认识。也是刚刚学习不久，也会有些问题，望共同进步。RBM模型在原理上的实现和代码实现还是有所不同。其基本框架如下：

训练时采用分批训练：训练数据一共60000个，分成600批，每批100个样本。data是训练样本；s是sigmoid函数。根据上面这个框架思路来理解以下代码应该难度不大。

rbm.m代码解读

% Version 1.000

% Code provided by GeoffHinton and Ruslan Salakhutdinov

% Permission is granted foranyone to copy, use, modify, or distribute this

% program and accompanyingprograms and documents for any purpose, provided

% this copyright notice isretained and prominently displayed, along with

% a note saying that theoriginal programs are available from our

% web page.

% The programs and documentsare distributed without any warranty, express or

% implied. As the programs were written for researchpurposes only, they have

% not been tested to thedegree that would be advisable in any important

% application. All use of these programs is entirely at theuser's own risk.

% This program trainsRestricted Boltzmann Machine in which

% visible, binary, stochasticpixels are connected to

% hidden, binary, stochasticfeature detectors using symmetrically

% weighted connections.Learning is done with 1-step Contrastive Divergence.

% The program assumes thatthe following variables are set externally:

% maxepoch -- maximum number of epochs

% numhid -- number of hidden units

% batchdata -- the data thatis divided into batches (numcases numdims numbatches)

% restart -- set to 1 if learning starts frombeginning

epsilonw = 0.1; % Learning rate for weights

epsilonvb = 0.1; % Learning rate for biases of visible units

epsilonhb = 0.1; % Learning rate for biases of hidden units

weightcost = 0.0002;

initialmomentum = 0.5;

finalmomentum = 0.9;%%momentum变量的作用：本次权值更新会保留一部分上次更新权值的增量值

[numcases numdimsnumbatches]=size(batchdata); %%100,784,600

if restart ==1,

restart=0;

epoch=1;

% Initializing symmetricweights and biases.

vishid = 0.1*randn(numdims,numhid);%%784*1000初始化（随机给的）可见层到隐含层的权重矩阵

hidbiases = zeros(1,numhid);%%初始化隐含层的偏差

visbiases = zeros(1,numdims);%%初始化可见层的偏差

poshidprobs = zeros(numcases,numhid);%%100*1000单个batch正向传播时隐含层的输出概率

neghidprobs = zeros(numcases,numhid);%% 反向隐含层的输出概率

posprods =zeros(numdims,numhid);%%784*1000 正向可见单元概率生成

negprods =zeros(numdims,numhid);%%反向可见单元概率生成

vishidinc =zeros(numdims,numhid);%%可见单元与隐藏单元之间权值增量

hidbiasinc = zeros(1,numhid);%%隐含层偏差的增量

visbiasinc = zeros(1,numdims);%%可见层偏差的增量

batchposhidprobs=zeros(numcases,numhid,numbatches);%%100*1000*600 存储每次迭代计算好的每层的隐藏层概率，作为下一个RBM的输入

end

for epoch = epoch:maxepoch,%%总共迭代10次，开始迭代，进行pre-training

fprintf(1,'epoch %d\r',epoch);

errsum=0;%%初始化输出误差为0

for batch = 1:numbatches,%%600每次处理一批次的数据

fprintf(1,'epoch %d batch %d\r',epoch,batch);

%%%%%%%%% START POSITIVEPHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

data = batchdata(:,:,batch);%%784*100*600每次迭代提取每一批次的数据进行预训练，每一行代表一个样本值data28*28*100（每批100个样本）double型，并未二值化

poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1)));%%通过s型函数计算隐含层输出概率值

batchposhidprobs(:,:,batch)=poshidprobs;%%将隐含层的结果作为下一层RBM的可见层

posprods = data' * poshidprobs;%%用于计算系统的能量值用的；矩阵中的每个元素表示对应的可视层节点和隐含层输出概率的乘积

poshidact = sum(poshidprobs);%%正向隐藏层输出概率求和;把每一列（共100列，即100个样本）隐含层的激活值累加起来

posvisact = sum(data); %% 样本值求和；把每一列（100列，即100个样本）可见层的数据累加

%%%%%%%%% END OF POSITIVEPHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

poshidstates = poshidprobs > rand(numcases,numhid);%%%将隐含层输出概率二值化。大于随机概率的置1，小于随机概率的置0；rand（m,n）产生m*n大小的矩阵，矩阵中元素为0-1之间的均匀分布

%%%%%%%%% START NEGATIVEPHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

negdata = 1./(1 + exp(-poshidstates*vishid' -repmat(visbiases,numcases,1)));%%反向进行时生成（重构）可视层的激活值

neghidprobs = 1./(1 + exp(-negdata*vishid -repmat(hidbiases,numcases,1)));%% 由重构的可视层来再次生成隐藏层

negprods = negdata'*neghidprobs;%%用于计算能量值，隐藏层重构数据的期望矩阵输出

neghidact = sum(neghidprobs); % 用重构的可见层的激活值来产生隐含层输出的概率值求和

negvisact = sum(negdata);%%重构数据求和

%%%%%%%%% END OF NEGATIVEPHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

err= sum(sum( (data-negdata).^2 ));%%重构数据的误差

errsum = err + errsum;%%整体误差

if epoch>5,%%迭代次数不同，调整不同冲量

momentum=finalmomentum;%%若大于5次，冲量为0.9

else

momentum=initialmomentum;%%若大于5次，冲量为0.5

end;

%%%%%%%%% UPDATE WEIGHTS ANDBIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

vishidinc = momentum*vishidinc + ...

epsilonw*((posprods-negprods)/numcases - weightcost*vishid);%%权值增量计算

visbiasinc = momentum*visbiasinc +(epsilonvb/numcases)*(posvisact-negvisact);%%偏置增量计算

hidbiasinc = momentum*hidbiasinc +(epsilonhb/numcases)*(poshidact-neghidact);%%隐藏层增量计算

vishid = vishid + vishidinc;%%更新参数；可见层到隐藏层的权重更新

visbiases = visbiases + visbiasinc;%%可见层到隐藏层的偏置更新

hidbiases = hidbiases + hidbiasinc;%%隐藏层的偏置更新

%%%%%%%%%%%%%%%% END OFUPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

end

fprintf(1, 'epoch %4i error %6.1f \n', epoch, errsum);

end;

2 0