How many kinds of Kohonen networks exist?
来源:互联网 发布:上海家博会数据 编辑:程序博客网 时间:2024/06/08 02:09
How many kinds of Kohonen networks exist?
(And what is k-means?)
======================
Teuvo Kohonen is one of the most famous and prolific researchers in
neurocomputing, and he has invented a variety of networks. But many people
refer to "Kohonen networks" without specifying which kind of Kohonen
network, and this lack of precision can lead to confusion. The phrase
"Kohonen network" most often refers to one of the following three types of
networks:
o VQ: Vector Quantization--competitive networks that can be viewed as
unsupervised density estimators or autoassociators (Kohonen, 1995/1997;
Hecht-Nielsen 1990), closely related to k-means cluster analysis
(MacQueen, 1967; Anderberg, 1973). Each competitive unit corresponds to a
cluster, the center of which is called a "codebook vector". Kohonen's
learning law is an on-line algorithm that finds the codebook vector
closest to each training case and moves the "winning" codebook vector
closer to the training case. The codebook vector is moved a certain
proportion of the distance between it and the training case, the
proportion being specified by the learning rate, that is:
new_codebook = old_codebook * (1-learning_rate)
+ data * learning_rate
Numerous similar algorithms have been developed in the neural net and
machine learning literature; see Hecht-Nielsen (1990) for a brief
historical overview, and Kosko (1992) for a more technical overview of
competitive learning.
MacQueen's on-line k-means algorithm is essentially the same as Kohonen's
learning law except that the learning rate is the reciprocal of the
number of cases that have been assigned to the winnning cluster. Suppose
that when processing a given training case, N cases have been previously
assigned to the winning codebook vector. Then the codebook vector is
updated as:
new_codebook = old_codebook * N/(N+1)
+ data * 1/(N+1)
This reduction of the learning rate makes each codebook vector the mean
of all cases assigned to its cluster and guarantees convergence of the
algorithm to an optimum value of the error function (the sum of squared
Euclidean distances between cases and codebook vectors) as the number of
training cases goes to infinity. Kohonen's learning law with a fixed
learning rate does not converge. As is well known from stochastic
approximation theory, convergence requires the sum of the infinite
sequence of learning rates to be infinite, while the sum of squared
learning rates must be finite (Kohonen, 1995, p. 34). These requirements
are satisfied by MacQueen's k-means algorithm.
Kohonen VQ is often used for off-line learning, in which case the
training data are stored and Kohonen's learning law is applied to each
case in turn, cycling over the data set many times (incremental
training). Convergence to a local optimum can be obtained as the training
time goes to infinity if the learning rate is reduced in a suitable
manner as described above. However, there are off-line k-means
algorithms, both batch and incremental, that converge in a finite number
of iterations (Anderberg, 1973; Hartigan, 1975; Hartigan and Wong, 1979).
The batch algorithms such as Forgy's (1965; Anderberg, 1973) have the
advantage for large data sets, since the incremental methods require you
either to store the cluster membership of each case or to do two
nearest-cluster computations as each case is processed. Forgy's algorithm
is a simple alternating least-squares algorithm consisting of the
following steps:
1. Initialize the codebook vectors.
2. Repeat the following two steps until convergence:
A. Read the data, assigning each case to the nearest (using Euclidean
distance) codebook vector.
B. Replace each codebook vector with the mean of the cases that were
assigned to it.
Fastest training is usually obtained if MacQueen's on-line algorithm is
used for the first pass and off-line k-means algorithms are applied on
subsequent passes (Bottou and Bengio, 1995). However, these training
methods do not necessarily converge to a global optimum of the error
function. The chance of finding a global optimum can be improved by using
rational initialization (SAS Institute, 1989, pp. 824-825), multiple
random initializations, or various time-consuming training methods
intended for global optimization (Ismail and Kamel, 1989; Zeger, Vaisy,
and Gersho, 1992).
VQ has been a popular topic in the signal processing literature, which
has been largely separate from the literature on Kohonen networks and
from the cluster analysis literature in statistics and taxonomy. In
signal processing, on-line methods such as Kohonen's and MacQueen's are
called "adaptive vector quantization" (AVQ), while off-line k-means
methods go by the names of "Lloyd" or "Lloyd I" (Lloyd, 1982) and "LBG"
(Linde, Buzo, and Gray, 1980). There is a recent textbook on VQ by Gersho
and Gray (1992) that summarizes these algorithms as information
compression methods.
Kohonen's work emphasized VQ as density estimation and hence the
desirability of equiprobable clusters (Kohonen 1984; Hecht-Nielsen 1990).
However, Kohonen's learning law does not produce equiprobable
clusters--that is, the proportions of training cases assigned to each
cluster are not usually equal. If there are I inputs and the number of
clusters is large, the density of the codebook vectors approximates the
I/(I+2) power of the density of the training data (Kohonen, 1995, p.
35; Ripley, 1996, p. 202; Zador, 1982), so the clusters are approximately
equiprobable only if the data density is uniform or the number of inputs
is large. The most popular method for obtaining equiprobability is
Desieno's (1988) algorithm which adds a "conscience" value to each
distance prior to the competition. The conscience value for each cluster
is adjusted during training so that clusters that win more often have
larger conscience values and are thus handicapped to even out the
probabilities of winning in later iterations.
Kohonen's learning law is an approximation to the k-means model, which is
an approximation to normal mixture estimation by maximum likelihood
assuming that the mixture components (clusters) all have spherical
covariance matrices and equal sampling probabilities. Hence if the
population contains clusters that are not equiprobable, k-means will tend
to produce sample clusters that are more nearly equiprobable than the
population clusters. Corrections for this bias can be obtained by
maximizing the likelihood without the assumption of equal sampling
probabilities Symons (1981). Such corrections are similar to conscience
but have the opposite effect.
In cluster analysis, the purpose is not to compress information but to
recover the true cluster memberships. K-means differs from mixture models
in that, for k-means, the cluster membership for each case is considered
a separate parameter to be estimated, while mixture models estimate a
Teuvo Kohonen is one of the most famous and prolific researchers in
neurocomputing, and he has invented a variety of networks. But many people
refer to "Kohonen networks" without specifying which kind of Kohonen
network, and this lack of precision can lead to confusion. The phrase
"Kohonen network" most often refers to one of the following three types of
networks:
o VQ: Vector Quantization--competitive networks that can be viewed as