1.1 模式识别问题概述(1)

来源：互联网发布：hbase 软件开发指南编辑：程序博客网时间：2024/06/04 19:19

1.1 Formulation of Pattern Recognition Problems

1.1模式识别问题概述
Many important applications of pattern recognition can be characterized
as either waveform classification or classification of geometric figures. For

许多重要的模式识别应用可以根据其特点划归为波形分类问题或几何图象分类.
example, consider the problem of testing a machine for normal or abnormal
operation by observing the output voltage of a microphone over a period of

如通过观察一段耳麦输出的电压来测试机器是否正常工作.

time. This problem reduces to discrimination of waveforms from good and

这个问题可以简化为对好坏机器的波形辨别
bad machines. On the other hand, recognition of printed English Characters
corresponds to classification of geometric figures. In order to perform this type

另外,英文字符的识别对应几何图象的识别.为了进行这类识别,
of classification, we must first measure the observable characteristics of the

我们首先需要测量样本的可观察特性.
sample. The most primitive but assured way to extract all information contained
in the sample is to measure the time-sampled values for a waveform,
x ( t , ) , . . . , x(t,,), and the grey levels of pixels for a figure, x(1) , . . . , A-(n), as
shown in Fig. 1-1. These n measurements form a vector X. Even under the
normal machine condition, the observed waveforms are different each time the
observation is made. Therefore, x(ti) is a random variable and will be
expressed, using boldface, as x(ti). Likewise, X is called a random vector if its
components are random variables and is expressed as X. Similar arguments
hold for characters: the observation, x(i), varies from one A to another and
therefore x(i) is a random variable, and X is a random vector.

....略

Thus, each waveform or character is expressed by a vector (or a sample)
因此,每个波形或字符被表示成n纬空间的一个向量(或样本)

in an n-dimensional space, and many waveforms or characters form a distribution
of X in the n-dimensional space. Figure 1-2 shows a simple twodimensional

很多波形和字符构成X在n纬空间的一个分布.
example of two distributions corresponding to normal and
abnormal machine conditions, where points depict the locations of samples and
solid lines are the contour lines of the probability density functions. If we

这里的点代表的是样本的位置,实线是概率密度函数的轮廓线
know these two distributions of X from past experience, we can set up a boundary
between these two distributions, g (x1, x2) = 0, which divides the twodimensional
space into two regions. Once the boundary is selected, we can
classify a sample without a class label to a normal or abnormal machine,
depending on g (x1I , x2)< 0 or g ( x1 , , x 2 ) >O. We call g (x , x 2 ) a discriminant
function, and a network which detects the sign of g (x 1, x2) is called a pattern
recognition network, a categorizer, or a classifier. Figure 1-3 shows a block
diagram of a classifier in a general n-dimensional space. Thus, in order to
design a classifier, we must study the characteristics of the distribution of X for
each category and find a proper discriminant function. This process is called
learning or training, and samples used to design a classifier are called learning

or training samples. The discussion can be easily extended to multi-category
cases.

Thus, pattern recognition, or decision-making in a broader sense, may be
considered as a problem of estimating density functions in a high-dimensional
space and dividing the space into the regions of categories or classes. Because

of this view, mathematical statistics forms the foundation of the subject. Also,
since vectors and matrices are used to represent samples and linear operators,
respectively, a basic knowledge of linear algebra is required to read this book.
Chapter 2 presents a brief review of these two subjects.

The first question we ask is what is the theoretically best classifier,
assuming that the distributions of the random vectors are given. This problem
is statistical hypothesis testing, and the Bayes classifier is the best classifier
which minimizes the probability of classification error. Various hypothesis
tests are discussed in Chapter 3.
The probability of error is the key parameter in pattern recognition. The
error due to the Bayes classifier (the Bayes error) gives the smallest error we
can achieve from given distributions. In Chapter 3, we discuss how to calculate
the Bayes error. We also consider a simpler problem of finding an upper
bound of the Bayes error.

Although the Bayes classifier is optimal, its implementation is often
difficult in practice because of its complexity, particularly when the dimensionality
is high. Therefore, we are often led to consider a simpler, parametric
classifier. Parametric classifiers are based on assumed mathematical forms for
either the density functions or the discriminant functions. Linear, quadratic, or
piecewise classifiers are the simplest and most common choices. Various
design procedures for these classifiers are discussed in Chapter 4.
Even when the mathematical forms can be assumed, the values of the
parameters are not given in practice and must be estimated from available samples.
With a finite number of samples, the estimates of the parameters and
subsequently of the classifiers based on these estimates become random variables.
The resulting classification error also becomes a random variable and is
biased with a variance. Therefore, it is important to understand how the
number of samples affects classifier design and its performance. Chapter 5
discusses this subject.

When no parametric structure can be assumed for the density functions,
we must use nonparametric techniques such as the Parzen and k-nearest neighbor
approaches for estimating density functions. In Chapter 6, we develop the
basic statistical properties of these estimates.
Then, in Chapter 7, the nonparametric density estimates are applied to
classification problems. The main topic in Chapter 7 is the estimation of the
Bayes error without assuming any mathematical form for the density functions.
In general, nonparametric techniques are very sensitive to the number of control
parameters, and tend to give heavily biased results unless the values of
these parameters are carefully chosen. Chapter 7 presents an extensive discussion
of how to select these parameter values.(原书23页)