cs231n assignment1--svm

来源：互联网发布：软件英文怎么写编辑：程序博客网时间：2024/05/01 22:23

本节作业的难点主要是svm的梯度向量的求解，在完成作业的时候找了大量的资料才求解出来。

首先是矩阵求导的问题，我是参考以下博客：
http://www.cnblogs.com/huashiyiqike/p/3568922.html

在这作业中我们只需要用到以下公式
d(xT∗A/dx=AT)

下面是损失函数的公式：
Li=∑j!=yi[max(0,wTjxi−wTyixi+Δ)]

那么dW可以这样表示，

\nabla w L i, j = α α w L i = {α α w j L i = x T i (j! = y i, w T j x i - w T y i x i + Δ > 0) α α w y i L i = - x T i (j = y i, w T j x i - w T y i x i + Δ > 0)

对于j!=yi，可以得到：
∇wjLi=1(wTjxi−wTyixi+Δ>0))xi

对与j==yi,可以得到：
∇wyiLi=−(∑j!=yi1(wTjxi−wTyixi+Δ>0))xi
其中，1(·)是示性函数，其取值规则为：1(表达式为真) =1；1(表达式为假) =0。

对于 j==yi, 因为只要wTjxi−wTyixi+Δ>0,dW 的j分量就要加
xi ,所以可以和j!=yi的部分一起计算。

在求dW时，用for循环可以很方便的计算答案。但是我们还需要向量化来加速这个过程。像我线代已经遗忘的差不多了。就比较难想到实现办法。最后我还是动手先实现一些简单的例子，再去完成作业。
在草稿纸上胡乱画的例子，然后就有了灵感
先是这样：

⎡ ⎣ ⎢ ⎢ 147258369 ⎤ ⎦ ⎥ ⎥ * ⎡ ⎣ ⎢ ⎢ 100000000 ⎤ ⎦ ⎥ ⎥ = ⎡ ⎣ ⎢ ⎢ 147000000 ⎤ ⎦ ⎥ ⎥

发现矩阵第一列加到了最后的结果上，继续尝试

⎡ ⎣ ⎢ ⎢ 147258369 ⎤ ⎦ ⎥ ⎥ * ⎡ ⎣ ⎢ ⎢ 110000000 ⎤ ⎦ ⎥ ⎥ = ⎡ ⎣ ⎢ ⎢ 3515000000 ⎤ ⎦ ⎥ ⎥

第一列，第二列同时加到了最后结果的第一列上。
可以通过代码实现这个尝试的过程

import numpy as npa = np.array([[1,2,3],[4,5,6],[7,8,9]])b = np.array([[1,0,0],[1,0,0],[0,0,0]])print np.dot(a,b)

多尝试几次，可以发现，b矩阵的i行是把a矩阵的i列加到最后的结果上，而b矩阵i行j列则决定加上最后的j列上面。（表达的有点奇怪，你们可以根据自己理解整理这个规律，如果矩阵学的比较好，可以不用理会这个东西）

以下是linear_svm.py实现代码：

import numpy as npfrom random import shuffledef svm_loss_naive(W, X, y, reg):  """  Structured SVM loss function, naive implementation (with loops).  Inputs have dimension D, there are C classes, and we operate on minibatches  of N examples.  Inputs:  - W: A numpy array of shape (D, C) containing weights.  - X: A numpy array of shape (N, D) containing a minibatch of data.  - y: A numpy array of shape (N,) containing training labels; y[i] = c means    that X[i] has label c, where 0 <= c < C.  - reg: (float) regularization strength  Returns a tuple of:  - loss as single float  - gradient with respect to weights W; an array of same shape as W  """  dW = np.zeros(W.shape) # initialize the gradient as zero  # compute the loss and the gradient  num_classes = W.shape[1] # C  num_train = X.shape[0]  # N  loss = 0.0  for i in xrange(num_train):    scores = X[i].dot(W)    correct_class_score = scores[y[i]]    for j in xrange(num_classes):      if j == y[i]:        continue      margin = scores[j] - correct_class_score + 1 # note delta = 1      if margin > 0:        loss += margin        dW[:,y[i]] -= X[i,:]        dW[:,j] += X[i,:]  # Right now the loss is a sum over all training examples, but we want it  # to be an average instead so we divide by num_train.  loss /= num_train  dW /=num_train  # Add regularization to the loss.  loss += 0.5 * reg * np.sum(W * W)  dW +=reg*W;  #############################################################################  # TODO:                                                                     #  # Compute the gradient of the loss function and store it dW.                #  # Rather that first computing the loss and then computing the derivative,   #  # it may be simpler to compute the derivative at the same time that the     #  # loss is being computed. As a result you may need to modify some of the    #  # code above to compute the gradient.                                       #  #############################################################################  return loss, dWdef svm_loss_vectorized(W, X, y, reg):  """  Structured SVM loss function, vectorized implementation.  Inputs and outputs are the same as svm_loss_naive.  """  loss = 0.0  num_train= X.shape[0]  dW = np.zeros(W.shape) # initialize the gradient as zero  scores = np.dot(X,W)  correct_class_scores = scores[np.arange(num_train),y]  correct_class_scores = np.reshape(correct_class_scores,(num_train,-1))  margin = scores-correct_class_scores+1.0  margin[np.arange(num_train),y]=0.0  margin[margin<=0]=0.0  loss += np.sum(margin)/num_train  loss += 0.5*reg*np.sum(W*W)  #############################################################################  # TODO:                                                                     #  # Implement a vectorized version of the structured SVM loss, storing the    #  # result in loss.                                                           #  #############################################################################  pass  #############################################################################  #                             END OF YOUR CODE                              #  #############################################################################  margin[margin>0]=1.0  row_sum = np.sum(margin,axis=1)  margin[np.arange(num_train),y] = -row_sum  dW = 1.0/num_train*np.dot(X.T,margin) + reg*W;  #############################################################################  # TODO:                                                                     #  # Implement a vectorized version of the gradient for the structured SVM     #  # loss, storing the result in dW.                                           #  #                                                                           #  # Hint: Instead of computing the gradient from scratch, it may be easier    #  # to reuse some of the intermediate values that you used to compute the     #  # loss.                                                                     #  #############################################################################  pass  #############################################################################  #                             END OF YOUR CODE                              #  #############################################################################  return loss, dW

作业部分代码：

Multiclass Support Vector Machine exercise¶# Run some setup code for this notebook.import randomimport numpy as npfrom cs231n.data_utils import load_CIFAR10import matplotlib.pyplot as plt# This is a bit of magic to make matplotlib figures appear inline in the# notebook rather than in a new window.%matplotlib inlineplt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plotsplt.rcParams['image.interpolation'] = 'nearest'plt.rcParams['image.cmap'] = 'gray'# Some more magic so that the notebook will reload external python modules;# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython%load_ext autoreload%autoreload 2# Load the raw CIFAR-10 data. cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)# As a sanity check, we print out the size of the training and test data.print 'Training data shape: ', X_train.shapeprint 'Training labels shape: ', y_train.shapeprint 'Test data shape: ', X_test.shapeprint 'Test labels shape: ', y_test.shape# Visualize some examples from the dataset.# We show a few examples of training images from each class.classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']num_classes = len(classes)samples_per_class = 7for y, cls in enumerate(classes):    idxs = np.flatnonzero(y_train == y)    idxs = np.random.choice(idxs, samples_per_class, replace=False)    for i, idx in enumerate(idxs):        plt_idx = i * num_classes + y + 1        plt.subplot(samples_per_class, num_classes, plt_idx)        plt.imshow(X_train[idx].astype('uint8'))        plt.axis('off')        if i == 0:            plt.title(cls)plt.show()# Split the data into train, val, and test sets. In addition we will# create a small development set as a subset of the training data;# we can use this for development so our code runs faster.num_training = 49000num_validation = 1000num_test = 1000num_dev = 500# Our validation set will be num_validation points from the original# training set.mask = range(num_training, num_training + num_validation)X_val = X_train[mask]y_val = y_train[mask]# Our training set will be the first num_train points from the original# training set.mask = range(num_training)X_train = X_train[mask]y_train = y_train[mask]# We will also make a development set, which is a small subset of# the training set.mask = np.random.choice(num_training, num_dev, replace=False)X_dev = X_train[mask]y_dev = y_train[mask]# We use the first num_test points of the original test set as our# test set.mask = range(num_test)X_test = X_test[mask]y_test = y_test[mask]print 'Train data shape: ', X_train.shapeprint 'Train labels shape: ', y_train.shapeprint 'Validation data shape: ', X_val.shapeprint 'Validation labels shape: ', y_val.shapeprint 'Test data shape: ', X_test.shapeprint 'Test labels shape: ', y_test.shape# Preprocessing: reshape the image data into rowsX_train = np.reshape(X_train, (X_train.shape[0], -1))X_val = np.reshape(X_val, (X_val.shape[0], -1))X_test = np.reshape(X_test, (X_test.shape[0], -1))X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))# As a sanity check, print out the shapes of the dataprint 'Training data shape: ', X_train.shapeprint 'Validation data shape: ', X_val.shapeprint 'Test data shape: ', X_test.shapeprint 'dev data shape: ', X_dev.shape#零均值的中心化# Preprocessing: subtract the mean image# first: compute the image mean based on the training datamean_image = np.mean(X_train, axis=0)print mean_image[:10] # print a few of the elementsplt.figure(figsize=(4,4))plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean imageplt.show()# second: subtract the mean image from train and test dataX_train -= mean_imageX_val -= mean_imageX_test -= mean_imageX_dev -= mean_image# third: append the bias dimension of ones (i.e. bias trick) so that our SVM# only has to worry about optimizing a single weight matrix W.X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])print X_train.shape, X_val.shape, X_test.shape, X_dev.shape# Evaluate the naive implementation of the loss we provided for you:from cs231n.classifiers.linear_svm import svm_loss_naiveimport time# generate a random SVM weight matrix of small numbersW = np.random.randn(3073, 10) * 0.0001 loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.00001)print 'loss: %f' % (loss, )# Once you've implemented the gradient, recompute it with the code below# and gradient check it with the function we provided for you# Compute the loss and its gradient at W.loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)# Numerically compute the gradient along several randomly chosen dimensions, and# compare them with your analytically computed gradient. The numbers should match# almost exactly along all dimensions.from cs231n.gradient_check import grad_check_sparsef = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]grad_numerical = grad_check_sparse(f, W, grad)# do the gradient check once again with regularization turned on# you didn't forget the regularization gradient did you?loss, grad = svm_loss_naive(W, X_dev, y_dev, 1e2)f = lambda w: svm_loss_naive(w, X_dev, y_dev, 1e2)[0]grad_numerical = grad_check_sparse(f, W, grad)# Next implement the function svm_loss_vectorized; for now only compute the loss;# we will implement the gradient in a moment.tic = time.time()loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)toc = time.time()print 'Naive loss: %e computed in %fs' % (loss_naive, toc - tic)from cs231n.classifiers.linear_svm import svm_loss_vectorizedtic = time.time()loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)toc = time.time()print 'Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic)# The losses should match but your vectorized implementation should be much faster.print 'difference: %f' % (loss_naive - loss_vectorized)# Complete the implementation of svm_loss_vectorized, and compute the gradient# of the loss function in a vectorized way.# The naive implementation and the vectorized implementation should match, but# the vectorized version should still be much faster.tic = time.time()_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.00001)toc = time.time()print 'Naive loss and gradient: computed in %fs' % (toc - tic)tic = time.time()_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.00001)toc = time.time()print 'Vectorized loss and gradient: computed in %fs' % (toc - tic)# The loss is a single number, so it is easy to compare the values computed# by the two implementations. The gradient on the other hand is a matrix, so# we use the Frobenius norm to compare them.difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')print 'difference: %f' % differenceStochastic Gradient DescentWe now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.# In the file linear_classifier.py, implement SGD in the function# LinearClassifier.train() and then run it with the code below.from cs231n.classifiers import LinearSVMsvm = LinearSVM()tic = time.time()loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=5e4,                      num_iters=1500, verbose=True)toc = time.time()print 'That took %fs' % (toc - tic)# A useful debugging strategy is to plot the loss as a function of# iteration number:plt.plot(loss_hist)plt.xlabel('Iteration number')plt.ylabel('Loss value')plt.show()# Write the LinearSVM.predict function and evaluate the performance on both the# training and validation sety_train_pred = svm.predict(X_train)print 'training accuracy: %f' % (np.mean(y_train == y_train_pred), )y_val_pred = svm.predict(X_val)print 'validation accuracy: %f' % (np.mean(y_val == y_val_pred), )# Use the validation set to tune hyperparameters (regularization strength and# learning rate). You should experiment with different ranges for the learning# rates and regularization strengths; if you are careful you should be able to# get a classification accuracy of about 0.4 on the validation set.#learning_rates = [1e-7, 5e-5]#regularization_strengths = [5e4, 1e5]learning_rates = [1.4e-7, 1.5e-7, 1.6e-7]regularization_strengths = [3e4, 3.1e4, 3.2e4, 3.3e4, 3.4e4]# results is dictionary mapping tuples of the form# (learning_rate, regularization_strength) to tuples of the form# (training_accuracy, validation_accuracy). The accuracy is simply the fraction# of data points that are correctly classified.results = {}best_val = -1   # The highest validation accuracy that we have seen so far.best_svm = None # The LinearSVM object that achieved the highest validation rate.from cs231n.classifiers import LinearSVMfor i in range(np.shape(learning_rates)[0]):    for j in range(np.shape(learning_rates)[0]):        svm = LinearSVM()        learning_rate = learning_rates[i]        reg = regularization_strengths[j]        loss_hist = svm.train(X_train, y_train, learning_rate, reg,                      num_iters=1500, verbose=True)        y_train_pred = svm.predict(X_train)        training_accuracy=np.mean(y_train == y_train_pred)        y_val_pred = svm.predict(X_val)        validation_accuracy=np.mean(y_val == y_val_pred)        results[(learning_rate,reg)]=(training_accuracy,validation_accuracy)        if(best_val<validation_accuracy):            best_val = validation_accuracy            best_svm = svm################################################################################# TODO:                                                                        ## Write code that chooses the best hyperparameters by tuning on the validation ## set. For each combination of hyperparameters, train a linear SVM on the      ## training set, compute its accuracy on the training and validation sets, and  ## store these numbers in the results dictionary. In addition, store the best   ## validation accuracy in best_val and the LinearSVM object that achieves this  ## accuracy in best_svm.                                                        ##                                                                              ## Hint: You should use a small value for num_iters as you develop your         ## validation code so that the SVMs don't take much time to train; once you are ## confident that your validation code works, you should rerun the validation   ## code with a larger value for num_iters.                                      #################################################################################pass#################################################################################                              END OF YOUR CODE                                ################################################################################## Print out results.for lr, reg in sorted(results):    train_accuracy, val_accuracy = results[(lr, reg)]    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (                lr, reg, train_accuracy, val_accuracy)print 'best validation accuracy achieved during cross-validation: %f' % best_val# Visualize the cross-validation resultsimport mathx_scatter = [math.log10(x[0]) for x in results]y_scatter = [math.log10(x[1]) for x in results]# plot training accuracymarker_size = 100colors = [results[x][0] for x in results]plt.subplot(2, 1, 1)plt.scatter(x_scatter, y_scatter, marker_size, c=colors)plt.colorbar()plt.xlabel('log learning rate')plt.ylabel('log regularization strength')plt.title('CIFAR-10 training accuracy')# plot validation accuracycolors = [results[x][1] for x in results] # default size of markers is 20plt.subplot(2, 1, 2)plt.scatter(x_scatter, y_scatter, marker_size, c=colors)plt.colorbar()plt.xlabel('log learning rate')plt.ylabel('log regularization strength')plt.title('CIFAR-10 validation accuracy')plt.show()# Evaluate the best svm on test sety_test_pred = best_svm.predict(X_test)test_accuracy = np.mean(y_test == y_test_pred)print 'linear SVM on raw pixels final test set accuracy: %f' % test_accuracy# Visualize the learned weights for each class.# Depending on your choice of learning rate and regularization strength, these may# or may not be nice to look at.w = best_svm.W[:-1,:] # strip out the biasw = w.reshape(32, 32, 3, 10)w_min, w_max = np.min(w), np.max(w)classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']for i in xrange(10):  plt.subplot(2, 5, i + 1)  # Rescale the weights to be between 0 and 255  wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)  plt.imshow(wimg.astype('uint8'))  plt.axis('off')  plt.title(classes[i])

1 0