多层神经网络

来源：互联网发布：上海网络电视台编辑：程序博客网时间：2024/04/28 21:54

本文简单整理自《模式分类》第二版的第六章，先上一张图，描述了三层神经网络的基本概念（图片看不清的请在图片上“右键》新标签页中打开”）。

多层神经网络的理论基础参见《模式分类》第六章，这里没有做相关讨论。下面将简单分析一个stochasic backpropagation的matlab代码

function [test_targets, Wh, Wo, J] = Backpropagation_Stochastic(train_patterns, train_targets, test_patterns, params)% Classify using a backpropagation network with stochastic learning algorithm% Inputs:% training_patterns   - Train patterns%training_targets- Train targets%   test_patterns       - Test  patterns%params              - Number of hidden units, Convergence criterion, Convergence rate%% Outputs%test_targets        - Predicted targets%   Wh                  - Hidden unit weights%   Wo                  - Output unit weights%   J                   - Error throughout the training[Nh, Theta, eta] = process_params(params);iter         = 1;[Ni, M]          = size(train_patterns);No         = 1;Uc               = length(unique(train_targets));%If there are only two classes, remap to {-1,1}if (Uc == 2)    train_targets    = (train_targets>0)*2-1;end%Initialize the net: In this implementation there is only one output unit, so there%will be a weight vector from the hidden units to the output units, and a weight matrix%from the input units to the hidden units.%The matrices are defined with one more weight so that there will be a biasw0= max(abs(std(train_patterns')'));Wh= rand(Nh, Ni+1).*w0*2-w0; %Hidden weightsWo= rand(No, Nh+1).*w0*2-w0; %Output weightsWo    = Wo/mean(std(Wo'))*(Nh+1)^(-0.5);Wh    = Wh/mean(std(Wh'))*(Ni+1)^(-0.5);rate= 10*Theta;J(1)    = 1e3;while (rate > Theta),    %Randomally choose an example    i= randperm(M);    m= i(1);    Xm = train_patterns(:,m);    tk = train_targets(m);        %Forward propagate the input:    %First to the hidden units    gh= Wh*[Xm; 1];    [y, dfh]= activation(gh);    %Now to the output unit    go= Wo*[y; 1];    [zk, dfo]= activation(go);        %Now, evaluate delta_k at the output: delta_k = (tk-zk)*f'(net)    delta_k= (tk - zk).*dfo;        %...and delta_j: delta_j = f'(net)*w_j*delta_k    delta_j= dfh'.*Wo(1:end-1).*delta_k;        %w_kj <- w_kj + eta*delta_k*y_j    Wo= Wo + eta*delta_k*[y;1]';        %w_ji <- w_ji + eta*delta_j*[Xm;1]    Wh= Wh + eta*delta_j'*[Xm;1]';        iter = iter + 1;    %Calculate total error    J(iter)    = 0;    for i = 1:M,        J(iter) = J(iter) + (train_targets(i) - activation(Wo*[activation(Wh*[train_patterns(:,i); 1]); 1])).^2;    end    J(iter) = J(iter)/M;     rate  = abs(J(iter) - J(iter-1))/J(iter-1)*100;        if (iter/100 == floor(iter/100)),        disp(['Iteration ' num2str(iter) ': Total error is ' num2str(J(iter))])    end    enddisp(['Backpropagation converged after ' num2str(iter) ' iterations.'])%Classify the test patternstest_targets = zeros(1, size(test_patterns,2));for i = 1:size(test_patterns,2),    test_targets(i) = activation(Wo*[activation(Wh*[test_patterns(:,i); 1]); 1]);endif (Uc == 2)    test_targets  = test_targets >0;endfunction [f, df] = activation(x)a = 1.716;b = 2/3;f= a*tanh(b*x);df= a*b*sech(b*x).^2;

算法本身是梯度下降算法的一种扩展。迭代地按一定规则逐步更新w值使算法达到局部最优，w更新的规则是

w(m+1) = w(m) + Δw(m)

因为是三层网络，所以要对Wkj和Wji分别进行更新，这就是

    Wo= Wo + eta*delta_k*[y;1]';    Wh= Wh + eta*delta_j'*[Xm;1]';

代码中的

[f, df] = activation(x)

实现上图中提到的activation函数，f为节点输出端的值，df为f(net)的差分即f'(net).

我们没对

Nh, Theta, eta

这三个参数进行特定的选择，默认依次为5, 0.1, 0.1，表示隐节点个数为5，dJ<0.1时结束循环，算法中的η更新速度为0.1，使用其的分了结果如下图，由此可知效果不是很好。

用于对比的SVM效果如下，SVM的分类效果很好。

以上只是最简单的神经网络的一种训练方式，要获得好的效果还需要做大量的改进。

SVM的出现比神经网络晚3~4年，SVM的出现就是为了与神经网络竞争而产生的，2006年，神经网络一族为了打败SVM，提出了深度学习（Deep Learning）算法，最近这个算法非常火，有机器学习志向的应该好好研究。

Refrences:

[1] To C. A. Rosen and C. W. Stork, patten classfication, edition 2.