[Coursera机器学习]K-means Clustering and Principal Component Analysis WEEK8编程作业

来源:互联网 发布:淘宝售后安装平台接单 编辑:程序博客网 时间:2024/06/03 13:46

1.1.1 Finding closest centroids

Your task is to complete the code in findClosestCentroids.m. This function takes the data matrix X and the locations of all centroids inside centroids and should output a one-dimensional array idx that holds the index (a value in {1,…,K}, where K is total number of centroids) of the closest centroid to every training example.

% Traverse all points of Xfor i = 1:length(X)    min = Inf;    % Traverse all points of K, judge which cluster the point belong to    for j = 1:K        dist = norm(X(i,:) - centroids(j,:))^2;        if dist < min            min = dist;            idx(i) = j;        end    endend

1.1.2 Computing centroid means

You should now complete the code in computeCentroids.m. You can implement this function using a loop over the centroids. You can also use a loop over the examples; but if you can use a vectorized implementation that does not use such a loop, your code may run faster.

for i = 1:K    count = 0;    for j = 1:m        if idx(j) == i            count = count + 1;            centroids(i,:) = centroids(i,:) + X(j,:);        end    end    centroids(i,:) = centroids(i,:) / count;end

1.3 Random initialization

The initial assignments of centroids for the example dataset in ex7.m were designed so that you will see the same figure as in Figure 1. In practice, a good strategy for initializing the centroids is to select random examples from the training set.
In this part of the exercise, you should complete the function kMeansInitCentroids.m with the following code:

% Initialize the centroids to be random examples% Randomly reorder the indices of examplesrandidx = randperm(size(X, 1));% Take the first K examples as centroidscentroids = X(randidx(1:K), :);

2.2 Implementing PCA

Before using PCA, it is important to first normalize the data by subtracting the mean value of each feature from the dataset, and scaling each dimension so that they are in the same range. In the provided script ex7_pca.m, this normalization has been performed for you using the featureNormalize function.
After normalizing the data, you can run PCA to compute the principal components. You task is to complete the code in pca.m to compute the principal components of the dataset. First, you should compute the covariance matrix of the data, which is given by:

=1mXTX

where X is the data matrix with examples in rows, and m is the number of examples. Note that  is a n*n matrix and not the summation operator.

% ====================== YOUR CODE HERE ======================% Instructions: You should first compute the covariance matrix. Then, you%               should use the "svd" function to compute the eigenvectors%               and eigenvalues of the covariance matrix. %% Note: When computing the covariance matrix, remember to divide by m (the%       number of examples).%Sigma = (X' * X) / m;[U, S, V] = svd(Sigma);

2.3.1 Projecting the data onto the principal components

You should now complete the code in projectData.m. Specifically, you are given a dataset X, the principal components U, and the desired number of dimensions to reduce to K. You should project each example in X onto the top K components in U. Note that the top K components in U are given by the
first K columns of U, that is U_reduce = U(:, 1:K).

% ====================== YOUR CODE HERE ======================% Instructions: Compute the projection of the data using only the top K %               eigenvectors in U (first K columns). %               For the i-th example X(i,:), the projection on to the k-th %               eigenvector is given as follows:%                    x = X(i, :)';%                    projection_k = x' * U(:, k);%U_reduced = U(:, 1:K);Z = X * U_reduced;

2.3.2 Reconstructing an approximation of the data

After projecting the data onto the lower dimensional space, you can approximately recover the data by projecting them back onto the original high dimensional space. Your task is to complete recoverData.m to project each example in Z back onto the original space and return the recovered approximation in X_rec.

% ====================== YOUR CODE HERE ======================% Instructions: Compute the approximation of the data by projecting back%               onto the original space using the top K eigenvectors in U.%%               For the i-th example Z(i,:), the (approximate)%               recovered data for dimension j is given as follows:%                    v = Z(i, :)';%                    recovered_j = v' * U(j, 1:K)';%%               Notice that U(j, 1:K) is a row vector.%               U_reduced = U(:, 1:K);X_rec = Z * U_reduced';
0 0