SVD与 LSI教程（3）：计算矩阵的全部奇异值

来源：互联网发布：json指南编辑：程序博客网时间：2024/05/20 01:08

（1）SVD与LSI教程（1）：理解SVD和LSI

（2）SVD和LSI教程（2）：计算奇异值

（3）SVD与 LSI教程（3）：计算矩阵的全部奇异值

（4）SVD 与 LSI 教程（4）： LSI计算

（5）SVD 与 LSI教程（5）：LSI关键字研究与协同理论

/**********************作者信息****************/

Dr. E. Garcia

Mi Islita.com

Email | Last Update: 01/07/07

/**********************作者信息****************/

Revisiting Singular Values

In Part 2 of this tutorial you have learned that SVD decomposes a regular matrix A into three matrices

Equation 1: A = USV^T

S was computed by the following procedure:

A^T and A^TA were computed.
the eigenvalues of A^TA were determined and sorted in descending order, in the absolute sense. The nonnegative square roots of these are the singular values of A.
S was constructed by placing singular values in descending order along its diagonal.

You learned that the Rank of a Matrix is the number of nonzero singular values.

You also learned that since S is a diagonal matrix, its nondiagonal elements are equal to zero. This can be verified by computing S fromU^TAV. However, one would need to know U first, which we have not defined yet. Either way, the alternate expression for S is obtained by postmultipliying by V and premultiplying by U^T Equation 1:

Equation 2: AV = USV^TV = US

Equation 3: U^TAV = S

This implies that U and V are orthogonal matrices. As discussed in Matrix Tutorial 2: Basic Matrix Operations, if a matrix M is orthogonal then

Equation 4: MM^T = M^TM = I = 1

where I is the identity matrix. But, we know that MM^-1 = I = 1. Consequently, M^T = M^-1.

Computing "right" eigenvectors, V, and V^T

In the example given in Part 2 you learned that the eigenvalues of AA^T and A^TA are identical since both respond to the same characteristic equation; e.g.

Characteristic equation and eigenvalues

Figure 1. Characteristic equation and eigenvalues for AA^T and A^TA.

Let's use these eigenvalues to compute the eigenvectors of A^TA. This is done by solving

Equation 5: (A - c_iI)X_i = 0

As mentioned in Matrix Tutorial 3: Eigenvalues and Eigenvectors for large matrices one would need to resource to the Power Method or other methods to do this. Fortunately in this case we are dealing with a small matrix, so we only need to use simple algebra.

We first compute eigenvectors for each eigenvalue, c₁ = 40 and c₂ = 10. Once computed, we convert eigenvectors to unit vectors. This is done by normalizing their lengths. Figure 2 illustrates these steps.

right-eigenvectors

Figure 2. Right eigenvectors of A^TA

We would have arrived at identical results if during normalization we assumed an arbitrary coordinate value for either x₁ or x₂. We now construct V by placing vectors along its columns and compute V^T

right-matrix

Figure 3. V and its transpose V^T

Hey! That wasn't that hard.

Note that we constructed V by preserving the order in which singular values were placed along the diagonal of S. That is, we placed the largest eigenvector in the first column and the second eigenvector in the second column. These end paired with singular values placed along the diagonal of S. Preserving the order in which singular values, eigenvalues and eigenvectors are placed in their corresponding matrices is very important. Otherwise we end with the wrong SVD.

Let's compute now the "left" eigenvectors and U.

Computing "left" eigenvectors and U

To compute U we can reuse eigenvalues and compute in exactly the same manner the eigenvectors of AA^T. Once these are computed we place these along the columns of U. However, with large matrices this is time consuming. In fact, one would need to compute eigenvectors by resourcing again to the Power Method or other suitable methods.

In practice, it is common to use the following shortcut. Postmultiply Equation 2 by S^-1 to obtain

Equation 6: AVS^-1 = USS^-1

Equation 7: U = AVS^-1

and then compute U. Since A and V are known already, we just need to invert S. Since S is a diagonal matrix, it must follows that

Inverted Singular Matrix

Figure 4. Inverted Singular Matrix.

Since s₁ = 40^1/2 = 6.3245 and s₂ = 10^1/2 = 3.1622 (expressed to four decimal places), then

Left eigenvectors and U

Figure 5. "Left" eigenvectors and U.

That was quite a mechanical task. Huh?

This shortcut is very popular since simplifies calculations. Unfortunately, its widespread use has resulted in many overlooking important information contained in the AA^Tmatrix. In recent years, LSI researchers has found that high-order term-term co-occurrence patterns contained in this matrix might be important. At least two studies (1, 2), one a 2005 thesis, indicate that high-order term-term co-occurrence present in this matrix might be at the heart of LSI.

These studies are:

Understanding LSI via the Truncated Term-term Matrix
A Framework for Understanding Latent Semantic Indexing (LSI) Performance

In the first issue of our IR Watch - The Newsletter -which is free- our subscribers learned about this thesis and other equally interesting LSI resources.

The orthogonal nature of the V and U matrices is evident by inspecting their eigenvectors. This can be demonstrated by computing dot products between column vectors. All dot products are equal to zero. A visual inspection is also possible in this case. In Figure 6 we have plotted eigenvectors. Observe that they are all orthogonal and end to the right and left of each other, from here the reference to these as "right" and "left" eigenvectors.

Right and Left Eigenvectors

Figure 6. "Right" and "Left" Eigenvectors.

Computing the Full SVD

So, we finally know U,S, V and V^T. To complete the proof, we reconstruct A by computing its full SVD.

Full SVD

Figure 7. Computing the full SVD.

So as we can see, SVD is a straightforward matrix decomposition and reconstruction technique.

The Reduced SVD

Obtaining an approximation of the original matrix is quite easy. This is done by truncating the three matrices obtained from a full SVD. Essentially we keep the first k columns of U, the first k rows of V^T and the first k rows and columns of S; that is, the first k singular values. This removes noisy dimensions and exposes the effect of the largest ksingular values on the original data. This effect is hidden, masked, latent, in the full SVD.

The reduction process is illustrated in Figure 8 and is often referred to as "computing the reduced SVD", dimensionality reduction or the Rank k Approximation.

Reduced SVD

Figure 8. The Reduced SVD or Rank k Approximation.

The shaded areas indicate the part of the matrices retained. The approximated matrix A_k is the Rank k Approximation of the original matrix and is defined as

Equation 8: A_k = U_kS_kV^T_k

So, once these matrices are approximated we simply compute their products to get A_k.

Quite easy. Huh?

Summary

So far we have learned that the full SVD of a matrix A can be computed by the following procedure:

compute its transpose A^T and A^TA.
determine the eigenvalues of A^TA and sort these in descending order, in the absolute sense. Square roots these to obtain the singular values of A.
Construct diagonal matrix S by placing singular values in descending order along its diagonal. Compute its inverse, S^-1.
use the ordered eigenvalues from step 2 and compute the eigenvectors of A^TA. Place these eigenvectors along the columns of V and compute its transpose, V^T.
Compute U as U = AVS^-1. To complete the proof, compute the full SVD using A = USV^T.

These steps are also summarized in our Singular Value Decomposition (SVD) - A Fast Track Tutorial.

Before concluding, let me mention this: in this tutorial, you have learned how SVD is applied to a matrix where m = n. This is just only one possible scenario. In general, if

m = n and all singular values are greater than zero, the pseudoinverse of A is given by A^-1 = VS^-1U^T.
m < n, S is m x n with last column elements being all zero. The SVD then gives a solution with minimum norm.
m > n, S is n x n, there are more equations than unknowns, and the SVD solution is least square.

Movellan discuss these cases in great detail.

So, how does SVD is used in Latent Semantic Indexing (LSI)?

In LSI, it is not the intent to reconstruct A. The goal is to find the best rank k approximation of A that would improve retrieval. The selection of k and the number of singular values in S to use is still an open area of research. During her ternure at Bellcore (now Telcordia), Microsoft's Susan Dumais mentioned in the 1995 presentationTranscription of the Application that her research group experimented with k values largely "by seat of the pants".

Early studies with the MED database using few hundred documents and dozen of queries indicate that performance versus k values are not entirely proportional, but tend to describe an inverted U-shaped curve of around k = 100. These results might change at other experimental conditions. At the time of writing the optimum k values are still determined via trial-and-error experimentation.

Now that we have the basic calculations out of the way, let's move forward and and learn how LSI scores documents and queries. It is time to demystify these calculations. Wait for Part 4 and see.

This is getting exciting.

Tutorial Review

For the matrix

Compute the eigenvalues of A^TA.
Prove that this is a matrix of Rank 2.
Compute its full SVD.
Compute its Rank 2 Approximation.

References

Understanding LSI via the Truncated Term-term Matrix, Thesis, Regis Newo, Germany (2005).
A Framework for Understanding Latent Semantic Indexing (LSI) Performance, April Kontostathis and William Pottenger (Lehigh University).
Transcription of the Application, Susan Dumais, Bellcore (1995).

0 0

SVD与 LSI教程（3）： 计算矩阵的全部奇异值