多层神经网络

来源：互联网发布：c语言中memset函数编辑：程序博客网时间：2024/04/28 23:19

多层神经网络

分类：图像音频2013-07-15 12:51 104人阅读评论(0) 收藏举报

神经网络

本文简单整理自《模式分类》第二版的第六章，先上一张图，描述了三层神经网络的基本概念（图片看不清的请在图片上“右键》新标签页中打开”）。

多层神经网络的理论基础参见《模式分类》第六章，这里没有做相关讨论。下面将简单分析一个stochasic backpropagation的matlab代码

[plain] view plaincopyprint?
function [test_targets, Wh, Wo, J] = Backpropagation_Stochastic(train_patterns, train_targets, test_patterns, params)  
  
% Classify using a backpropagation network with stochastic learning algorithm  
% Inputs:  
%   training_patterns   - Train patterns  
%   training_targets    - Train targets  
%   test_patterns       - Test  patterns  
%   params              - Number of hidden units, Convergence criterion, Convergence rate  
%  
% Outputs  
%   test_targets        - Predicted targets  
%   Wh                  - Hidden unit weights  
%   Wo                  - Output unit weights  
%   J                   - Error throughout the training  
  
[Nh, Theta, eta] = process_params(params);  
iter             = 1;  
  
[Ni, M]          = size(train_patterns);  
No               = 1;  
  
Uc               = length(unique(train_targets));  
%If there are only two classes, remap to {-1,1}  
if (Uc == 2)  
    train_targets    = (train_targets>0)*2-1;  
end  
  
%Initialize the net: In this implementation there is only one output unit, so there  
%will be a weight vector from the hidden units to the output units, and a weight matrix  
%from the input units to the hidden units.  
%The matrices are defined with one more weight so that there will be a bias  
w0      = max(abs(std(train_patterns')'));  
Wh      = rand(Nh, Ni+1).*w0*2-w0; %Hidden weights  
Wo      = rand(No, Nh+1).*w0*2-w0; %Output weights  
  
Wo    = Wo/mean(std(Wo'))*(Nh+1)^(-0.5);  
Wh    = Wh/mean(std(Wh'))*(Ni+1)^(-0.5);  
  
rate    = 10*Theta;  
J(1)    = 1e3;  
  
while (rate > Theta),  
    %Randomally choose an example  
    i   = randperm(M);  
    m   = i(1);  
    Xm = train_patterns(:,m);  
    tk = train_targets(m);  
      
    %Forward propagate the input:  
    %First to the hidden units  
    gh              = Wh*[Xm; 1];  
    [y, dfh]        = activation(gh);  
    %Now to the output unit  
    go              = Wo*[y; 1];  
    [zk, dfo]   = activation(go);  
      
    %Now, evaluate delta_k at the output: delta_k = (tk-zk)*f'(net)  
    delta_k     = (tk - zk).*dfo;  
      
    %...and delta_j: delta_j = f'(net)*w_j*delta_k  
    delta_j     = dfh'.*Wo(1:end-1).*delta_k;  
      
    %w_kj <- w_kj + eta*delta_k*y_j  
    Wo              = Wo + eta*delta_k*[y;1]';  
      
    %w_ji <- w_ji + eta*delta_j*[Xm;1]  
    Wh              = Wh + eta*delta_j'*[Xm;1]';  
      
    iter            = iter + 1;  
  
    %Calculate total error  
    J(iter)    = 0;  
    for i = 1:M,  
        J(iter) = J(iter) + (train_targets(i) - activation(Wo*[activation(Wh*[train_patterns(:,i); 1]); 1])).^2;  
    end  
    J(iter) = J(iter)/M;   
    rate  = abs(J(iter) - J(iter-1))/J(iter-1)*100;  
      
    if (iter/100 == floor(iter/100)),  
        disp(['Iteration ' num2str(iter) ': Total error is ' num2str(J(iter))])  
    end  
      
end  
  
disp(['Backpropagation converged after ' num2str(iter) ' iterations.'])  
  
%Classify the test patterns  
test_targets = zeros(1, size(test_patterns,2));  
for i = 1:size(test_patterns,2),  
    test_targets(i) = activation(Wo*[activation(Wh*[test_patterns(:,i); 1]); 1]);  
end  
  
if (Uc == 2)  
    test_targets  = test_targets >0;  
end  
  
  
  
function [f, df] = activation(x)  
  
a = 1.716;  
b = 2/3;  
f   = a*tanh(b*x);  
df  = a*b*sech(b*x).^2;  

算法本身是梯度下降算法的一种扩展。迭代地按一定规则逐步更新w值使算法达到局部最优，w更新的规则是

w(m+1) = w(m) + Δw(m)

因为是三层网络，所以要对Wkj和Wji分别进行更新，这就是

[plain] view plaincopyprint?
<span style="font-size:14px;">    Wo                = Wo + eta*delta_k*[y;1]';  
    Wh              = Wh + eta*delta_j'*[Xm;1]';</span>  

代码中的

[plain] view plaincopyprint?
<span style="font-size:14px;">[f, df] = activation(x)</span>  

实现上图中提到的activation函数，f为节点输出端的值，df为f(net)的差分即f'(net).

我们没对

[plain] view plaincopyprint?
Nh, Theta, eta  

这三个参数进行特定的选择，默认依次为5, 0.1, 0.1，表示隐节点个数为5，dJ<0.1时结束循环，算法中的η更新速度为0.1，使用其的分了结果如下图，由此可知效果不是很好。

用于对比的SVM效果如下，SVM的分类效果很好。

以上只是最简单的神经网络的一种训练方式，要获得好的效果还需要做大量的改进。

SVM的出现比神经网络晚3~4年，SVM的出现就是为了与神经网络竞争而产生的，2006年，神经网络一族为了打败SVM，提出了深度学习（Deep Learning）算法，最近这个算法非常火，有机器学习志向的应该好好研究。

Refrences:

[1] To C. A. Rosen and C. W. Stork, patten classfication, edition 2.