机器学习学习笔记 PRML Chapter 2.0 - Prerequisite 2 -Singular Value Decomposition (SVD)

来源：互联网发布：cf手游m249天羽数据编辑：程序博客网时间：2024/05/20 08:45

Chapter 2.0 : Prerequisite 2 -Singular Value Decomposition (SVD)

PRML, OXford University Deep Learning Course, Machine Learning, Pattern Recognition
Christopher M. Bishop, PRML, Chapter 2 Probability Distributions

Chapter 20 Prerequisite 2 -Singular Value Decomposition SVD
- Vector Terminology
- Matrix Terminology
  - 1 Orthogonal Matrix
  - 2 Eigenvectors and Eigenvalues
    - THE KEY IDEAS see Ref-7
  - 3 Understanding eigenvectors and eigenvalues in terms of transformation and the corresponding matrix see Ref-9
- Singular Value Decomposition
  - 1 Understanding of SVD
  - 2 Statement of the SVD Theorem
    - Some Conclusion and Simple Proof
  - 3 An example of SVD
  - 4 Intuitive Interpretations of SVD see Ref-5
    - 1 Points in d-dimension Space
    - 2 The Best Least Squares Fit Problem
    - 3 Singular Vectors and Singular Values
    - 4 The Frobenius norm of A
  - 5 Intuitive Interpretations of SVD see Ref-6
    - 1 The image shows
    - 2 Singular values as semiaxes of an ellipse or ellipsoid
    - 3 The columns of U and V are orthonormal bases
- Expansion of eigenvalues and eigenvectors see Ref-8
  - - Problem - PRML Exercise 219
    - Solution
      - 1 Lemma 4-1 实对称矩阵正交相似于对角矩阵即 A_n为实对称方阵 Longrightarrow exists正交矩阵U such that
      - 2 Lemma 4-2 Matrix A and B are identical if and only if for all vectors vecv Avecv Bvecv That is
      - 3 Proof
- Best Rank k Approximation using SVD see Ref-5
  - Theorem 51
  - Theorem 52
  - Theorem 53
  - Theorem 54
- The Geometry of Linear Transformations see Ref-3
  - 1 Matrix and Transformation
    - 结论
  - 2 The Geometry of Eigenvectors and Eigenvalues
    - How to calculate this angle of roughly 5828circ
    - Solution
  - 3 The singular value decomposition
  - 4 How do we find the singular decomposition
  - 5 Another example
  - 6 SVD Application 1 Data compression
  - 7 SVD Application 2 Noise reduction
  - 8 SVD Application 3 Data analysis
- Summary
  - - The logic relationship of those concepts is shown in the following figure
- Reference

1. Vector Terminology

Orthogonality
Two vectors [Math Processing Error] and [Math Processing Error] are said to be orthogonal to each other if their inner product equals zero, i.e., [Math Processing Error]
Normal Vector
A normal vector (or unit vector ) [Math Processing Error] is a vector of length 1, i.e., [Math Processing Error]
Orthonormal Vectors
Vectors of unit length that are orthogonal to each other are said to be orthonormal.

2. Matrix Terminology

2.1 Orthogonal Matrix

A matrix [Math Processing Error] is orthogonal if [Math Processing Error] where [Math Processing Error] is the identity matrix.

2.2 Eigenvectors and Eigenvalues

An eigenvector is a nonzero vector that satisfies the equation [Math Processing Error]
where [Math Processing Error] is a square matrix,
- the scalar [Math Processing Error] is an eigenvalue, and
- [Math Processing Error] is the eigenvector.

Eigenvalues and eigenvectors are also known as, respectively, characteristic roots(特征值) and characteristic vectors(特征向量), or latent roots and latent vectors.

THE KEY IDEAS [see Ref-7]:

[Math Processing Error] says that eigenvectors [Math Processing Error] keep the same direction when multiplied by [Math Processing Error].
[Math Processing Error] also says that [Math Processing Error]. This determines [Math Processing Error] eigenvalues.
The eigenvalues of [Math Processing Error] and [Math Processing Error] are [Math Processing Error] and [Math Processing Error], respectively, with the same eigenvectors.
The sum of the [Math Processing Error]’s equals the sum down the main diagonal of [Math Processing Error] (the trace), i.e., [Math Processing Error]
The product of the [Math Processing Error]’s equals the determinant, i.e., [Math Processing Error]

2.3 Understanding eigenvectors and eigenvalues in terms of transformation and the corresponding matrix [see Ref-9]

In linear algebra, an eigenvector or characteristic vector of a linear transformation [Math Processing Error] from a vector space [Math Processing Error] over a field [Math Processing Error] into itself is a non-zero vector that does not change its direction when that linear transformation is applied to it. In other words, if [Math Processing Error] is a vector that is not the zero vector, then it is an eigenvector of a linear transformation [Math Processing Error] if [Math Processing Error] is a scalar multiple of [Math Processing Error]. This condition can be written as the mapping [Math Processing Error] where [Math Processing Error] is a scalar in the field [Math Processing Error], known as the eigenvalue or characteristic value associated with the eigenvector [Math Processing Error].

If the vector space [Math Processing Error] is finite-dimensional, then the linear transformation [Math Processing Error] can be represented as a square matrix [Math Processing Error], and the vector [Math Processing Error] by a column vector, rendering the above mapping as a matrix multiplication on the left hand side and a scaling of the column vector on the right hand side in the equation [Math Processing Error]

There is a correspondence between [Math Processing Error] by [Math Processing Error] square matrices and linear transformations from an n-dimensional vector space to itself. For this reason, it is equivalent to define eigenvalues and eigenvectors using either the language of matrices or the language of linear transformations.

Geometrically, an eigenvector corresponding to a real, nonzero eigenvalue points in a direction that is stretched by the transformation and the eigenvalue is the factor by which it is stretched. If the eigenvalue is negative, the direction is reversed.

It can be shown in the following figure, where matrix [Math Processing Error] acts by stretching the vector [Math Processing Error], not changing its direction, so [Math Processing Error] is an eigenvector of [Math Processing Error].

Alt text|center

3. Singular Value Decomposition

3.1 Understanding of SVD

Singular value decomposition (SVD) can be looked at from three mutually compatible points of view.
- 1) a method for transforming correlated variables into a set of uncorrelated ones that better expose the various relationships among the original data items.
- 2) a method for identifying and ordering the dimensions along which data points exhibit the most variation.
- 3) a method for data reduction, since once we have identified where the most variation is, it’s possible to find the best approximation of the original data points using fewer dimensions.

3.2 Statement of the SVD Theorem

SVD is based on a theorem from linear algebra which says that a rectangular matrix [Math Processing Error] can be broken down into the product of three matrices:
- an orthogonal matrix [Math Processing Error](i.e., [Math Processing Error]);
- a diagonal matrix [Math Processing Error];
- the transpose of an orthogonal matrix [Math Processing Error] (i.e, [Math Processing Error]).

The theorem is usually presented something like this:
[Math Processing Error]

assuming [Math Processing Error] [see Ref-4 for this figure]:
assuming [Math Processing Error] [see Ref-4 for this figure]:
The columns of [Math Processing Error] and the columns of [Math Processing Error] are called the left-singular vectors and right-singular vectors of [Math Processing Error] , respectively.
The columns of [Math Processing Error] are orthonormal eigenvectors of [Math Processing Error].
There is a brief proof. Let [Math Processing Error], where the column vector [Math Processing Error], for [Math Processing Error], with [Math Processing Error].
[Math Processing Error] Firstly, to calculate the product of [Math Processing Error]:
[Math Processing Error]
The LHS of (3.1) equals:
[Math Processing Error]
Substitute (3.2) into (3.1) to generate the RHS of (3.1): [Math Processing Error] You can testify the second line of (3.4) by listing all the elements of the column vectors, and doing matrix production based on the matrix product rule. Therefore (3.3) and (3.4) give us the following euqation
[Math Processing Error]
Similarly, we can prove that the columns of [Math Processing Error] are orthonormal eigenvectors of [Math Processing Error],
[Math Processing Error]
[Math Processing Error] is a diagonal matrix containing the square roots of non-zero eigenvalues of both [Math Processing Error] and [Math Processing Error]. A common convention is to list the singular values in descending order. In this case, the diagonal matrix [Math Processing Error] is uniquely determined by [Math Processing Error] (though not the matrices [Math Processing Error] and [Math Processing Error]).
[Math Processing Error]
assuming [Math Processing Error], with [Math Processing Error], where [Math Processing Error] is called the singular values of the matrix [Math Processing Error].
[Math Processing Error] is the rank of matrix [Math Processing Error], i.e., [Math Processing Error], where [Math Processing Error], [Math Processing Error] means the range of [Math Processing Error], that is the set of possible linear combinations of the columns of [Math Processing Error].

Some Conclusion and Simple Proof:

Let [Math Processing Error], where [Math Processing Error], for [Math Processing Error]; and [Math Processing Error], where [Math Processing Error], for [Math Processing Error].
[Math Processing Error]
[Math Processing Error] where [Math Processing Error].
Similarly, we have [Math Processing Error] where [Math Processing Error].
That is, the columns of [Math Processing Error] and [Math Processing Error] are orthonormal vectors, respectively.

3.3 An example of SVD:

[Math Processing Error]
- To calculate [Math Processing Error] via finding the eigenvalues and corresponding eigenvectors of [Math Processing Error], to give [Math Processing Error]
- To calculate [Math Processing Error] via finding the eigenvalues and corresponding eigenvectors of [Math Processing Error], to give [Math Processing Error]
- S = [Math Processing Error]
- SVD result is [Math Processing Error]

3.4 Intuitive Interpretations of SVD [see Ref-5]

1) Points in d-dimension Space:

To gain insight into the SVD, treat the rows of an [Math Processing Error] (here we use [Math Processing Error] instead of [Math Processing Error], since it is common to be used to represent those n points of d-dimension) matrix [Math Processing Error] as [Math Processing Error] points in a d-dimensional space.

[Math Processing Error] is equivalent to
[Math Processing Error]
where the inner product [Math Processing Error] means the projection of point [Math Processing Error] (represented by column vector [Math Processing Error], i.e., the [Math Processing Error] row of matrix A) onto the line along which [Math Processing Error] is a unit vector.

2) The Best Least Squares Fit Problem:

Consider the problem of finding the best k-dimensional subspace with respect to the set of points. Here “best” means minimize the sum of the squares of the perpendicular distances of the points to the subspace. We begin with a special case of the problem where the subspace is 1-dimensional, a line through the origin. We will see later that the best-fitting k-dimensional subspace can be found by k applications of the best fitting line algorithm (i.e., 应用k次1-dim直线fitting即可得到the fitting k-dim subspace). Finding the best fitting line through the origin with respect to a set of points [Math Processing Error] in the plane means minimizing the sum of the squared distances of the points to the line. Here distance is measured perpendicular to the line (the corresponding problem is called the best least squares fit), or more often measured vertical in the y direction, to the subspace of [Math Processing Error] (with the corresponding problem - least squares fit).

Returning to the best least squares fit problem, consider projecting a point [Math Processing Error] onto a line through the origin. Then based on the following figure

we can get [Math Processing Error]

From (3.9) and the observation that [Math Processing Error] is a constant ( i.e., independent of the line), we get the equivalence

[Math Processing Error]
So minimizing the sum of the squares of the distances is equivalent to maximizing the sum of the squares of the lengths of the projections onto the line. This conclusion helps to introduce the subsequent definition of singular vectors.

3) Singular Vectors and Singular Values:

Singular Vectors: Consider the rows of [Math Processing Error] as [Math Processing Error] points in a d-dimensional space. Consider the best fit line through the origin. Let [Math Processing Error] be a unit vector along this line. The length of the projection of [Math Processing Error] (i.e., the [Math Processing Error] row of [Math Processing Error]) onto [Math Processing Error] is [Math Processing Error]. From this we see that the sum of length squared of the projections is [Math Processing Error]. The best fit line is the one maximizing [Math Processing Error] and hence minimizing the sum of the squared distances of the points to the line.
The First Singular Vector: With this in mind, define the first singular vector, [Math Processing Error] of [Math Processing Error], which is a column vector, as the best fit line through the origin for the [Math Processing Error] points in d-space that are the rows of [Math Processing Error]. Thus
[Math Processing Error]
The First Singular Value: The value [Math Processing Error] is called the first singular value of [Math Processing Error]. Note that [Math Processing Error] is the sum of the squares of the projections of the points to the line determined by [Math Processing Error].
The Second Singular Vector: The second singular vector [Math Processing Error], is defined by the best fit line perpendicular to [Math Processing Error]
[Math Processing Error]
The Second Singular Value: The value [Math Processing Error] is called the second singular value of [Math Processing Error]. Note that [Math Processing Error] is the sum of the squares of the projections of the points to the line determined by [Math Processing Error].
The Third Singular Vector: The third singular vector [Math Processing Error] is defined similarly by
[Math Processing Error]
The process stops when we have found [Math Processing Error] as singular vectors and [Math Processing Error] where [Math Processing Error], i.e, there exist at most [Math Processing Error] linearly independent eigenvectors.

4) The Frobenius norm of A:

Consider one row, say [Math Processing Error] of matrix [Math Processing Error]. Since [Math Processing Error] span the space of all rows of [Math Processing Error], [Math Processing Error] 0 for all [Math Processing Error] perpendicular to [Math Processing Error]. Thus, for each row [Math Processing Error], [Math Processing Error]. Summing over all rows,
[Math Processing Error]
But [Math Processing Error] that is the sum of squares of all the entries of [Math Processing Error]. Thus, the sum of squares of the singular values of [Math Processing Error] is indeed the square of the “whole content of [Math Processing Error]”, i.e., the sum of squares of all the entries. There is an important norm associated with this quantity, the Frobenius norm of [Math Processing Error], denoted by [Math Processing Error], defined as
[Math Processing Error]
It is shown is the following lemma:

[Math Processing Error]
[Math Processing Error]

3.5 Intuitive Interpretations of SVD [see Ref-6]

1) The image shows:

Upper Left: The unit disc with the two canonical unit vectors.
Upper Right: Unit disc transformed with M and singular Values [Math Processing Error] and [Math Processing Error] indicated.
Lower Left: The action of [Math Processing Error] on the unit disc. This is just a rotation. Here [Math Processing Error] means conjugate transpose.
Lower Right: The action of [Math Processing Error] on the unit disc. [Math Processing Error] scales in vertically and horizontally.
In this special case, the singular values are [Math Processing Error] and [Math Processing Error] where [Math Processing Error] is the Golden ratio, i.e., [Math Processing Error]
[Math Processing Error] is a (counter clockwise) rotation by an angle [Math Processing Error] where [Math Processing Error] satisfies [Math Processing Error]. [Math Processing Error] is a rotation by an angle [Math Processing Error] with [Math Processing Error].

2) Singular values as semiaxes of an ellipse or ellipsoid:

As shown in the figure, the singular values can be interpreted as the semiaxes of an ellipse in 2D. This concept can be generalized to n-dimensional Euclidean space, with the singular values of any [Math Processing Error] square matrix being viewed as the semiaxes of an n-dimensional ellipsoid. See below for further details.

3) The columns of U and V are orthonormal bases:

Since [Math Processing Error] and [Math Processing Error] are unitary, the columns of each of them form a set of orthonormal vectors, which can be regarded as basis vectors. The matrix [Math Processing Error] maps the basis vector [Math Processing Error] to the stretched unit vector [Math Processing Error] . By the definition of a unitary matrix, the same is true for their conjugate transposes [Math Processing Error] and [Math Processing Error], except the geometric interpretation of the singular values as stretches is lost. In short, the columns of [Math Processing Error], [Math Processing Error], [Math Processing Error] and [Math Processing Error] are orthonormal bases.

4. Expansion of eigenvalues and eigenvectors [see Ref-8]

Problem - PRML Exercise 2.19:

Show that a real, symmetric matrix [Math Processing Error] satisfying the eigenvector equation [Math Processing Error] cam be expressed as an expansion of its eigenvalues and eigenvectors of the following form
[Math Processing Error]
and similarly, the inverse [Math Processing Error] can be expressed as
[Math Processing Error]

Solution:

1) Lemma 4-1: 实对称矩阵正交相似于对角矩阵。即： [Math Processing Error]为实对称方阵 [Math Processing Error] [Math Processing Error]正交矩阵[Math Processing Error], such that

[Math Processing Error]
or due to [Math Processing Error], equivalent equations include [Math Processing Error] and [Math Processing Error]

2) Lemma 4-2: Matrix [Math Processing Error] and [Math Processing Error] are identical if and only if for all vectors [Math Processing Error], [Math Processing Error]. That is,

[Math Processing Error]

3) Proof:

The proof of (4.1) and (4.2) use (4.5) and (4.6). For any column vector [Math Processing Error],
we have
[Math Processing Error]

Since the inner product [Math Processing Error] in (4.7) is a scalar, and [Math Processing Error] is also a scalar, therefore we can change the order of the terms,

[Math Processing Error]
Thus applying the Lemma 2 shown in (4.6) to (4.8), we can prove (4.1).

Since [Math Processing Error], inverting both sides gives [Math Processing Error], and hence [Math Processing Error]. Applying the above result to [Math Processing Error], noting that [Math Processing Error] is just the diagonal matrix of the inverses of the diagonal elements of [Math Processing Error], we have proved (4.2).

5. Best Rank k Approximation using SVD [see Ref-5]

Let [Math Processing Error] be an [Math Processing Error] matrix and think of the rows of [Math Processing Error] as [Math Processing Error] points in d-dimensional space. There are two important matrix norms, the Frobenius norm denoted [Math Processing Error] and the 2-norm denoted [Math Processing Error].
- The 2-norm of the matrix A is given by [Math Processing Error] and thus equals the largest singular value of the matrix. That is, the 2-norm is the square root of the sum of squared distances to the origin along the direction that maximizes this quantity.
- The Frobenius norm of [Math Processing Error] is the square root of the sum of the squared distance of the points to the origin, shown in (3.17).

Let [Math Processing Error] and [Math Processing Error] be the SVD of [Math Processing Error]. For [Math Processing Error], let [Math Processing Error] be the sum truncated after [Math Processing Error] terms. It is clear that [Math Processing Error] has rank [Math Processing Error]. Furthermore, [Math Processing Error] is the best rank [Math Processing Error] approximation to [Math Processing Error] when the error is measured in either the 2-norm or the Frobenius norm (see Theorem 5.2 and Theorem 5.3).
Without proof, we give the following theorems (if interested, please check Lemma 1.6, Theorem 1.7, Theorem 1.8, and Theorem 1.9 in page 9-10 of Ref-5).

Theorem 5.1:

The rows of matrix [Math Processing Error] are the projections of the rows of [Math Processing Error] onto the subspace [Math Processing Error] spanned by the first [Math Processing Error] singular vectors of [Math Processing Error].

Theorem 5.2:

Let [Math Processing Error] be an [Math Processing Error] matrix, for any matrix [Math Processing Error] of rank at most [Math Processing Error], it holds that [Math Processing Error]

Theorem 5.3:

Let [Math Processing Error] be an [Math Processing Error] matrix, for any matrix [Math Processing Error] of rank at most [Math Processing Error], it holds that [Math Processing Error]

Theorem 5.4:

Let [Math Processing Error] be an [Math Processing Error] matrix, for [Math Processing Error] in (5.2) it holds that [Math Processing Error]

6. The Geometry of Linear Transformations [see Ref-3]

6.1 Matrix and Transformation

Let us begin by looking at some simple matrices, namely those with two rows and two columns. Our first example is the diagonal matrix [Math Processing Error]

Geometrically, we may think of a matrix like this as taking a point [Math Processing Error] in the plane and transforming it into another point [Math Processing Error]using matrix multiplication: [Math Processing Error]

The effect of this transformation is shown below : the plane is horizontally stretched by a factor of [Math Processing Error], while there is no vertical change.

Now let’s look at [Math Processing Error]. [Math Processing Error]
The four vertices of the red square shown in the following figure, [Math Processing Error] are transformed into [Math Processing Error], respectively, which produces this effect

It is not so clear how to describe simply the geometric effect of the transformation. However, let’s rotate our grid through a [Math Processing Error] angle and see what happens. The four vertices of the red square, [Math Processing Error] are transformed into [Math Processing Error], respectively, which produces this effect

We see now that this new grid is transformed in the same way that the original grid was transformed by the diagonal matrix: the grid is stretched by a factor of [Math Processing Error] in one direction.

This is a very special situation due to the fact that the matrix [Math Processing Error] is symmetric, i.e., [Math Processing Error]. If we have a symmetric [Math Processing Error] matrix, it turns out that

we may always rotate the grid in the domain so that the matrix acts by stretching and perhaps reflecting in the two directions. In other words, symmetric matrices behave like diagonal matrices.

结论：

以上的几张图，就是为了讨论given a [Math Processing Error] symmetric matrix [Math Processing Error], 即 [Math Processing Error]
- 如何放置坐标grid（或者说如何确定一个单位长度的正方形在坐标系中的位置和方向，要知道这个正方形可以用两个彼此互相垂直的单位向量[Math Processing Error] 和[Math Processing Error] 来表示），使得当该正方形被施加transformation（represented by a symmetric matrix [Math Processing Error]）时，这个正方形的形变发生沿着[Math Processing Error] 和 [Math Processing Error] 方向的单纯的拉伸或压缩。这就与后面即将讨论的矩阵的特征向量和特征值联系起来。即：
[Math Processing Error]
表示特征向量 [Math Processing Error] 被矩阵 [Math Processing Error] 变换之后，新的向量与原来向量平行（包括同向和反向），只是模长发生了改变而已。
- 如何求得这样的[Math Processing Error] 和 [Math Processing Error]呢？答案就是当[Math Processing Error]为对称矩阵时（当然，对称矩阵是一种特殊情况，接下来我们会讨论更为一般的矩阵），这样的[Math Processing Error] 和 [Math Processing Error]就是对称矩阵[Math Processing Error]的两个特征向量。即由[Math Processing Error]，求得特征向量和特征值为：
[Math Processing Error]
which accords with the [Math Processing Error] rotation of the red sqaure shown above.
- 对于这种特殊的对称矩阵[Math Processing Error], 它的SVD就演变成了 Lemma 4-1: 实对称矩阵正交相似于对角矩阵，正如（4.5）所示。可以把它看成是SVD的一种特殊情况，即：对于矩阵[Math Processing Error], 有如下SVD:
- 对于一般的矩阵[Math Processing Error], 存在正交矩阵[Math Processing Error]和[Math Processing Error]（即[Math Processing Error]）, 使得 [Math Processing Error] 对于(6.2)， [Math Processing Error] 即为[Math Processing Error]的特征向量组成，[Math Processing Error] 即为[Math Processing Error]的特征向量组成，对角矩阵[Math Processing Error]由[Math Processing Error]（或者[Math Processing Error]）的特征值的正平方根构成。
- 当[Math Processing Error]是实对称矩阵时，存在正交矩阵[Math Processing Error]（即[Math Processing Error]）, 使得 [Math Processing Error]
对于(6.3)， [Math Processing Error] 即为对称矩阵[Math Processing Error]的特征向量组成，对角矩阵[Math Processing Error]为对称矩阵[Math Processing Error]的特征值构成。当然也可以通过上面介绍的方法求解，即[Math Processing Error] 是由[Math Processing Error]的特征向量组成，对角矩阵[Math Processing Error]由[Math Processing Error]的特征值的正平方根构成[Math Processing Error]。两种方法是等价的、是一致的。

6.2 The Geometry of Eigenvectors and Eigenvalues

Said with more mathematical precision, given a symmetric matrix [Math Processing Error], we may find a set of orthogonal vectors [Math Processing Error] so that [Math Processing Error] is a scalar multiple of [Math Processing Error]; that is [Math Processing Error]
where [Math Processing Error] is a scalar.

Geometrically, this means that the vectors [Math Processing Error] are simply stretched and/or reflected(即方向改变了180°) when multiplied by [Math Processing Error]. Because of this property, we call
- Eigenvectors: the vectors [Math Processing Error] eigenvectors of [Math Processing Error];
- Eigenvalues: the scalars [Math Processing Error] are called eigenvalues.

An important fact, which is easily verified, is that eigenvectors of a symmetric matrix corresponding to different eigenvalues are orthogonal. If we use the eigenvectors of a symmetric matrix to align the grid, the matrix stretches and/or reflects the grid in the same way that it does the eigenvectors.

The geometric description we gave for this linear transformation is a simple one: the grid is simply stretched in one direction. For more general matrices, we will ask if we can find an orthogonal grid that is transformed into another orthogonal grid. Let’s consider a final example using a matrix that is not symmetric: [Math Processing Error]

This matrix produces the geometric effect known as a shear, shown as

It’s easy to find one family of eigenvectors along the horizontal axis. However, our figure above shows that these eigenvectors cannot be used to create an orthogonal grid that is transformed into another orthogonal grid.
- Nonetheless, let’s see what happens when we rotate the grid first by [Math Processing Error], shown as

Notice that the angle at the origin formed by the red parallelogram on the right has increased.
- Let’s next rotate the grid by [Math Processing Error].

It appears that the grid on the right is now almost orthogonal.
- In fact, by rotating the grid in the domain by an angle of roughly

[Math Processing Error], both grids are now orthogonal.
Alt text|center

How to calculate this angle of roughly [Math Processing Error]?

Solution:

Based on the discussion in (6.2), The columns of [Math Processing Error] are the eigenvectors of [Math Processing Error], results in [Math Processing Error], and
[Math Processing Error] where [Math Processing Error]
We can get [Math Processing Error], the corresponding eigenvectors are [Math Processing Error]
where the directions of [Math Processing Error] and [Math Processing Error] are ( You can run Matlab function [Math Processing Error] to get the result as follows)
[Math Processing Error]

6.3 The singular value decomposition

This is the geometric essence of the singular value decomposition for [Math Processing Error] matrices:

for any [Math Processing Error] matrix, we may find an orthogonal grid that is transformed into another orthogonal grid. We will express this fact using vectors:
- with an appropriate choice of orthogonal unit vectors [Math Processing Error] and [Math Processing Error], the vectors [Math Processing Error] and [Math Processing Error] are orthogonal.

We will use [Math Processing Error] and [Math Processing Error] to denote unit vectors in the direction of [Math Processing Error] and [Math Processing Error]. The lengths of [Math Processing Error] and [Math Processing Error] – denoted by [Math Processing Error] and [Math Processing Error] – describe the amount that the grid is stretched in those particular directions. These numbers are called the singular values of [Math Processing Error]. (In this case, the singular values are the golden ratio and its reciprocal, but that is not so important here.)

We therefore have
[Math Processing Error]
[Math Processing Error]

We may now give a simple description for how the matrix [Math Processing Error] treats a general vector [Math Processing Error]. Since the vectors [Math Processing Error] and [Math Processing Error] are orthogonal unit vectors, we have [Math Processing Error]

This means that [Math Processing Error]

Remember that the inner dot product may be computed using the vector transpose [Math Processing Error]
which leads to
[Math Processing Error]

This is usually expressed by writing [Math Processing Error]
where [Math Processing Error] is a matrix whose columns are the vectors [Math Processing Error] and [Math Processing Error], [Math Processing Error] is a diagonal matrix whose entries are [Math Processing Error] and [Math Processing Error], and [Math Processing Error] is a matrix whose columns are [Math Processing Error] and [Math Processing Error].

This shows how to decompose the matrix [Math Processing Error] into the product of three matrices:
- [Math Processing Error] describes an orthonormal basis in the domain (定义域), and
- [Math Processing Error] describes an orthonormal basis in the co-domain (值域), and
- [Math Processing Error] describes how much the vectors in [Math Processing Error] are stretched to give the vectors in [Math Processing Error].

6.4 How do we find the singular decomposition?

The power of the singular value decomposition lies in the fact that we may find it for any matrix. How do we do it? Let’s look at our earlier example and add the unit circle in the domain (定义域). Its image will be an ellipse whose major and minor axes define the orthogonal grid in the co-domain (值域).

Notice that the major and minor axes are defined by [Math Processing Error] and [Math Processing Error]. These vectors therefore are the longest and shortest vectors among all the images of vectors on the unit circle.

In other words, the function [Math Processing Error] on the unit circle has a maximum at [Math Processing Error] and a minimum at [Math Processing Error]. This reduces the problem to a rather standard calculus problem in which we wish to optimize a function over the unit circle. It turns out that the critical points of this function occur at the eigenvectors of the matrix [Math Processing Error]. Since this matrix is symmetric (since it is obvious that [Math Processing Error]), eigenvectors corresponding to different eigenvalues will be orthogonal. This gives the family of vectors [Math Processing Error].

The singular values are then given by [Math Processing Error], and the vectors [Math Processing Error] are obtained as unit vectors in the direction of [Math Processing Error].

But why are the vectors [Math Processing Error] orthogonal? To explain this, we will assume that [Math Processing Error] and [Math Processing Error] are distinct singular values. We have [Math Processing Error]
Let’s begin by looking at the expression [Math Processing Error] and assuming, for convenience, that the singular values are non-zero.
- On one hand, this expression is zero due to the orthogonal-to-one-another vectors [Math Processing Error]s’ and [Math Processing Error]s’, which are required to be eigenvectors of the symmetric matrix [Math Processing Error], i.e., [Math Processing Error]
Therefore,
[Math Processing Error]
- On the other hand, we have
[Math Processing Error] Therefore, [Math Processing Error] and [Math Processing Error] are orthogonal, so we have found an orthogonal set of vectors [Math Processing Error] that is transformed into another orthogonal set [Math Processing Error]. The singular values describe the amount of stretching in the different directions.

In practice, this is not the procedure used to find the singular value decomposition of a matrix since it is not particularly efficient or well-behaved numerically.

6.5 Another example

Let’s now look at the singular matrix [Math Processing Error]

We can get [Math Processing Error], the corresponding eigenvectors are [Math Processing Error]
where the directions of [Math Processing Error] and [Math Processing Error] are ( You can run Matlab function [Math Processing Error] to get the result as follows)
[Math Processing Error]

The geometric effect of this matrix is the following:

In this case, the second singular value is zero so that we may write:
[Math Processing Error]

In other words, if some of the singular values are zero, the corresponding terms do not appear in the decomposition for [Math Processing Error]. In this way, we see that the rank of [Math Processing Error], which is the dimension of the image of the linear transformation, is equal to the number of non-zero singular values.

6.6 SVD Application 1 – Data compression

Singular value decompositions can be used to represent data efficiently. Suppose, for instance, that we wish to transmit the following image, which consists of an array of [Math Processing Error] black or white pixels.

Since there are only three types of columns in this image, as shown below, it should be possible to represent the data in a more compact form.

We will represent the image as a [Math Processing Error] matrix [Math Processing Error] in which each entry is either a 0, representing a black pixel, or 1, representing white. As such, there are [Math Processing Error] entries in the matrix. If we perform a singular value decomposition on [Math Processing Error], we find there are only three non-zero singular values [Math Processing Error]
Therefore, the matrix [Math Processing Error] may be represented as
[Math Processing Error]

This means that we have three vectors [Math Processing Error], each of which has [Math Processing Error] entries, three vectors [Math Processing Error], each of which has [Math Processing Error] entries, and three singular values [Math Processing Error]. This implies that we may represent the matrix using only [Math Processing Error] numbers rather than the [Math Processing Error] that appear in the matrix. In this way, the singular value decomposition discovers the redundancy in the matrix and provides a format for eliminating it.

Why are there only three non-zero singular values? Remember that the number of non-zero singular values equals the rank of the matrix. In this case, we see that there are three linearly independent columns in the matrix, which means that [Math Processing Error].

6.7 SVD Application 2 – Noise reduction

The previous example showed how we can exploit a situation where many singular values are zero. Typically speaking, the large singular values point to where the interesting information is. For example, imagine we have used a scanner to enter this image into our computer. However, our scanner introduces some imperfections (usually called “noise“) in the image.

We may proceed in the same way: represent the data using a [Math Processing Error] matrix and perform a singular value decomposition. We find the following singular values:
[Math Processing Error]

Clearly, the first three singular values are the most important so we will assume that the others are due to the noise in the image and make the approximation
[Math Processing Error]
This leads to the following improved image.

6.8 SVD Application 3 – Data analysis

Noise also arises anytime we collect data: no matter how good the instruments are, measurements will always have some error in them. If we remember the theme that large singular values point to important features in a matrix, it seems natural to use a singular value decomposition to study data once it is collected. As an example, suppose that we collect some data as shown below:

We may take the data and put it into a matrix:
[Math Processing Error]
and perform a singular value decomposition. We find the singular values [Math Processing Error]

With one singular value so much larger than the other, it may be safe to assume that the small value of [Math Processing Error] is due to noise in the data and that this singular value would ideally be [Math Processing Error]. In that case, the matrix would have rank one meaning that all the data lies on the line defined by [Math Processing Error].

This brief example points to the beginnings of a field known as principal component analysis (PCA), a set of techniques that uses singular values to detect dependencies and redundancies in data.

In a similar way, singular value decompositions can be used to detect groupings in data, which explains why singular value decompositions are being used in attempts to improve Netflix’s movie recommendation system. Ratings of movies you have watched allow a program to sort you into a group of others whose ratings are similar to yours. Recommendations may be made by choosing movies that others in your group have rated highly.

7. Summary

The logic relationship of those concepts is shown in the following figure:

8. Reference

[1]: Kirk Baker, Singular Value Decomposition Tutorial,
https://www.ling.ohio-state.edu/~kbaker/pubs/Singular_Value_Decomposition_Tutorial.pdf;
[2]: Singular Value Decomposition (SVD) tutorial,
http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm;
[3]: We Recommend a Singular Value Decomposition,
http://www.ams.org/samplings/feature-column/fcarc-svd;
[4]: Computation of the Singular Value Decomposition,
http://www.cs.utexas.edu/users/inderjit/public_papers/HLA_SVD.pdf;
[5]: CMU, SVD Tutorial,
https://www.cs.cmu.edu/~venkatg/teaching/CStheory-infoage/book-chapter-4.pdf.
[6]: Wiki: Singular value decomposition,
https://en.wikipedia.org/wiki/Singular_value_decomposition.
[7]: Chapter 6 Eigenvalues and Eigenvectors,
http://math.mit.edu/~gs/linearalgebra/ila0601.pdf.
[8]: Expressing a matrix as an expansion of its eigenvalues,
http://math.stackexchange.com/questions/331826/expressing-a-matrix-as-an-expansion-of-its-eigenvalues.
[9]: Wiki: Eigenvalues and eigenvectors,
https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors.

0 0