THE Perceptron Convergence Theorem

来源:互联网 发布:淘宝装修日记精华帖 编辑:程序博客网 时间:2024/04/28 07:32

State the fixed-increment convergence theorem

Let the subject of trainings X 1   and X 2   be linearly separable. input vector:

x(n)=[1,x 1 (n),x 2 (n),...,x p (n)] 

Correspondingly, we define the (p + 1)-by-1 weight vector:
w(n)=[θ(n),w 1 (n),w 2 (n,),...,w p (n)] 

The output is written in the compact form
v(n)=w T (n)x(n) 

For fixed n, the equation w T x=0 , devide the inputs into tow classes as devision surface. let X 1   be the subset of training vectors belong to class ξ 1   and let X 2   be the subset of training vectors belong to class ξ 2  . We may state: w T x0  for every input vector x  belong to class ξ 1  .w T x0  for every input vector x  belong to class ξ 2  .
Then there exists a weight vector w  such that we may state:
1. w T 0  for every input vector x  belonging to class ξ 1  
and
2. w T 0  for every input vector x  belonging to class ξ 2  

The algorithm for adapting the weight vector of the elementary perceptron may now be formulated as follows:
if the n th member of the training vector, x(n) ,is correctly classified by the weight vectorw(n) 
1. w(n+1)=w(n)  if w T (n)x(n)0  and x  belong to class ξ 1  
2. w(n+1)=w(n)  if w T (n)x(n)0  and x  belong to class ξ 2  
otherwise
1.wn+1=w(n)η(n)x(n)  if w(n) T x(n)0  and x(n)  belongs to class ξ 1  
2.wn+1=w(n)+η(n)x(n)  if w(n) T x(n)0  and x(n)  belongs to class ξ 2  
where the learning-rate paramater η(n)  controls the adjustment applied to the weight vector at iteration n .
If η(n)=η0 , where η  is a constant independent of the iteration number n , we have a fixed inrement adaptation rule for the perceptron.
In the sequel,we first prove the convergenece of a fixed inrement adaption rule for which η =1. Clearly, the value of η  is unimportant, so long as it is positive.

Prove:

The initial condition w(0)=0 . Suppose that w T (n)x(n)<0  for n=1,2,... and an input vector x(n) belong to the subset X 1  . So

w(n+1)=w(n)+x(n)(1) 
for x(n)  belonging to class ξ 1  .
Given the initial condition w(0)=0 , we may iteratively solve this equation for w(n+1)  abtianing the result
w(n+1)=x(1)+x(2)+...+x(n)(2) 

As there exists asolution w 0  , we may define a positive number α by the relation
α=min x(n)X 1  w T 0 x(n)(3) 

Hence, multiplying both sides of Eq.(2) by the row vector w T 0  , we get
w T 0 w(n+1)=x(1)w T 0 +x(2)w T 0 +...+x(n)w T 0 nα 

Next, the Cauchy-Schwarz inequality states that
||w 0 || 2 ||w(n+1)|| 2 ||w T 0 w(n+1)|| 2 n 2 α 2 (4) 
or,equivalently,
||w(n+1)|| 2 n 2 α 2 ||w 0 || 2  (5) 

Next, we follow anther development route, as we know (1) rewrite:w(k+1)=w(k)+x(k) , taking the squared Euclidean norm of both sides of Eq.(1), we get
||w(k+1)|| 2 =||w(k)|| 2 +||x(k)|| 2 +2w T (k)x(k)(6) 
But, under the assumption that the perceptron incorrectlly classifies an input vectorx(k) belonging to the subset X 1  , we have w T (k)x(k)0 , so
||w(k+1)|| 2 ||w(k)|| 2 +||x(k)|| 2 (7) 
or, equivalently,
||w(k+1)|| 2 ||w(k)|| 2 ||x(k)|| 2 ,k=1,...,n(8) 

Adding these inequalities for k = 1,…,n, and assuming that the initial condition w=0 , we get the following condition:
||w(n+1)|| 2  k=1 n ||x(k)|| 2 nβ(9) 
where β  is a positive number defined by
β=max x(k)X 1  ||x(k)|| 2 (10) 

We can state that n  cannot be larger than some value n max   for which Eqs.(3)and(10) are both satisfied with the equality sign. That is,
n max =β||w 0 || 2 α 2  (11) 
We have thus proved that for η(n) =1 for all n, and w(0)=0 , and given that a solution vector w 0   exists, the rule for adapting the synaptic weights connecting the associator units to the response unit of the perceptron must terminate after at most n max   iterations

0 0
原创粉丝点击