THE Perceptron Convergence Theorem

来源：互联网发布：淘宝装修日记精华帖编辑：程序博客网时间：2024/04/28 07:32

State the fixed-increment convergence theorem

Let the subject of trainings X 1 and X 2 be linearly separable. input vector:

x (n) = [- 1, x 1 (n), x 2 (n), . . ., x p (n)]

Correspondingly, we define the (p + 1)-by-1 weight vector:

w (n) = [θ (n), w 1 (n), w 2 (n,), . . ., w p (n)]

The output is written in the compact form

v (n) = w T (n) x (n)

For fixed n, the equation

w T x=0 , devide the inputs into tow classes as devision surface. let

X 1 be the subset of training vectors belong to class

ξ 1 and let

X 2 be the subset of training vectors belong to class

ξ 2 . We may state:

w T x≥0 for every input vector

x belong to class

ξ 1 .

w T x≤0 for every input vector

x belong to class

ξ 2 .
Then there exists a weight vector

w such that we may state:
1.

w T ≥0 for every input vector

x belonging to class

ξ 1
and
2.

w T ≤0 for every input vector

x belonging to class

ξ 2

The algorithm for adapting the weight vector of the elementary perceptron may now be formulated as follows:
if the n th member of the training vector, x(n) ,is correctly classified by the weight vectorw(n)
1. w(n+1)=w(n) if w T (n)x(n)≥0 and x belong to class ξ 1
2. w(n+1)=w(n) if w T (n)x(n)≤0 and x belong to class ξ 2
otherwise
1.w（n+1）=w(n)−η(n)x(n) if w(n) T x(n)≥0 and x(n) belongs to class ξ 1
2.w（n+1）=w(n)+η(n)x(n) if w(n) T x(n)≤0 and x(n) belongs to class ξ 2
where the learning-rate paramater η(n) controls the adjustment applied to the weight vector at iteration n .
If η(n)=η≥0 , where η is a constant independent of the iteration number n , we have a fixed inrement adaptation rule for the perceptron.
In the sequel,we first prove the convergenece of a fixed inrement adaption rule for which η =1. Clearly, the value of η is unimportant, so long as it is positive.

Prove:

The initial condition w(0)=0 . Suppose that w T (n)x(n)<0 for n=1,2,... and an input vector x(n) belong to the subset X 1 . So

w (n + 1) = w (n) + x (n) (1)

for

x(n) belonging to class

ξ 1 .
Given the initial condition

w(0)=0 , we may iteratively solve this equation for

w(n+1) abtianing the result

w (n + 1) = x (1) + x (2) + . . . + x (n) (2)

As there exists asolution

w 0 , we may define a positive number

α by the relation

α = m i n x (n) \in X 1 w T 0 x (n) (3)

Hence, multiplying both sides of Eq.(2) by the row vector

w T 0 , we get

w T 0 w (n + 1) = x (1) w T 0 + x (2) w T 0 + . . . + x (n) w T 0 \geq n α

Next, the Cauchy-Schwarz inequality states that

| | w 0 | | 2 | | w (n + 1) | | 2 \geq | | w T 0 w (n + 1) | | 2 \geq n 2 α 2 (4)

or,equivalently,

| | w (n + 1) | | 2 \geq n 2 α 2 | | w 0 | | 2 (5)

Next, we follow anther development route, as we know (1) rewrite:

w(k+1)=w(k)+x(k) , taking the squared Euclidean norm of both sides of Eq.(1), we get

| | w (k + 1) | | 2 = | | w (k) | | 2 + | | x (k) | | 2 + 2 w T (k) x (k) (6)

But, under the assumption that the perceptron incorrectlly classifies an input vector

x(k) belonging to the subset

X 1 , we have

w T (k)x(k)≤0 , so

| | w (k + 1) | | 2 \leq | | w (k) | | 2 + | | x (k) | | 2 (7)

or, equivalently,

| | w (k + 1) | | 2 - | | w (k) | | 2 \leq | | x (k) | | 2, k = 1, . . ., n (8)

Adding these inequalities for k = 1,…,n, and assuming that the initial condition

w=0 , we get the following condition:

| | w (n + 1) | | 2 \leq \sum k = 1 n | | x (k) | | 2 \leq n β (9)

where

β is a positive number defined by

β = max x (k) \in X 1 | | x (k) | | 2 (10)

We can state that

n cannot be larger than some value

n max for which Eqs.(3)and(10) are both satisfied with the equality sign. That is,

n m a x = β | | w 0 | | 2 α 2 (11)

We have thus proved that for

η(n) =1 for all n, and

w(0)=0 , and given that a solution vector

w 0 exists, the rule for adapting the synaptic weights connecting the associator units to the response unit of the perceptron must terminate after at most n max iterations

0 0