CS231n-Assignment1(作业1)-SVM

来源：互联网发布：在淘宝开网店流程编辑：程序博客网时间：2024/05/16 01:48

- 说明知识储备
- svmipynb
- cs231nclassifierslinear_svmpy
- cs231nclassifierslinear_classifierpy

说明&知识储备

这个作业涉及到的文件比较多，首先是根目录下的
svm.ipynb,然后是/cs231n/classifiers/linear_svm.py,/cs231n/classifiers/linear_classifier.py这几个文件都需要自己的改动，然后还用到了/cs231n/gradient_check.py文件进行梯度检测，不过这个文件中用到的东西老师已经全部写好了，不需要自己改动，有兴趣的可以自己研究一下代码。
目录中的左斜杠和点好像不见了，目录加文件名可能看着有点奇怪，不过应该能看懂。
Loss Function
1.SVM

L i = \sum j \neq y i m a x (0, s j - s y i + Δ)

L i = \sum j \neq y i m a x (0, w T j x i - w T y i x i + Δ)

Δ通常取值为1

d W y i = - x i (\sum j \neq y i 1 (x i \cdot W j - x i \cdot W y i + 1 > 0)) + 2 λ W y i

d W j = x i \cdot 1 (x i \cdot W j - x i \cdot W y i + 1 > 0) + 2 λ W j, (j \neq y i)

其中,

1(⋅)是示性函数,1(表达式为真)=1;1(表达式为假)=0.
2.Softmax

L i = - l o g (e f y i \sum j e f i)

通常为了计算过程中的方便:

e f y i \sum j e f i = C e f y i C \sum j e f i = e f y i + l o g C \sum j e f i + l o g C

l o g C = - m a x j f j

d W j = \sum (x \cdot p) / n + λ w; p = e f y i \sum j e f i

d W y i = - \sum x \cdot (1 - p) / n + λ w; p = e f y i \sum j e f i

Regularization

L = 1 N \sum i L i + λ R (W); R (W) = \sum k \sum l W 2 k, l

Score Function

f : R D \to R K; H e r e : f (x i, W, b) = W x i + b

参考了以下文章:
https://www.zhihu.com/people/wu-sang-91/posts
http://www.cnblogs.com/daihengchen/p/5754383.html
感谢网友的代码
关于求梯度的部分,所有代码全都跑通了,但是任然理解的不是十分透彻.

svm.ipynb

# Multiclass Support Vector Machine exercise*Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the [assignments page](http://vision.stanford.edu/teaching/cs231n/assignments.html) on the course website.*In this exercise you will:- implement a fully-vectorized **loss function** for the SVM- implement the fully-vectorized expression for its **analytic gradient**- **check your implementation** using numerical gradient- use a validation set to **tune the learning rate and regularization** strength- **optimize** the loss function with **SGD**- **visualize** the final learned weights# Multiclass Support Vector Machine exercise*Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the [assignments page](http://vision.stanford.edu/teaching/cs231n/assignments.html) on the course website.*In this exercise you will:- implement a fully-vectorized **loss function** for the SVM- implement the fully-vectorized expression for its **analytic gradient**- **check your implementation** using numerical gradient- use a validation set to **tune the learning rate and regularization** strength- **optimize** the loss function with **SGD**- **visualize** the final learned weightsIn [1]:# Run some setup code for this notebook.import randomimport numpy as npfrom cs231n.data_utils import load_CIFAR10import matplotlib.pyplot as pltfrom __future__ import print_function# This is a bit of magic to make matplotlib figures appear inline in the# notebook rather than in a new window.%matplotlib inlineplt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plotsplt.rcParams['image.interpolation'] = 'nearest'plt.rcParams['image.cmap'] = 'gray'# Some more magic so that the notebook will reload external python modules;# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython%load_ext autoreload%autoreload 2CIFAR-10 Data Loading and PreprocessingIn [2]:# Load the raw CIFAR-10 data.cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)# As a sanity check, we print out the size of the training and test data.print('Training data shape: ', X_train.shape)print('Training labels shape: ', y_train.shape)print('Test data shape: ', X_test.shape)print('Test labels shape: ', y_test.shape)Training data shape:  (50000, 32, 32, 3)Training labels shape:  (50000,)Test data shape:  (10000, 32, 32, 3)Test labels shape:  (10000,)In [3]:# Visualize some examples from the dataset.# We show a few examples of training images from each class.classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']num_classes = len(classes)samples_per_class = 7for y, cls in enumerate(classes):    idxs = np.flatnonzero(y_train == y)    idxs = np.random.choice(idxs, samples_per_class, replace=False)    for i, idx in enumerate(idxs):        plt_idx = i * num_classes + y + 1        plt.subplot(samples_per_class, num_classes, plt_idx)        plt.imshow(X_train[idx].astype('uint8'))        plt.axis('off')        if i == 0:            plt.title(cls)plt.show()In [4]:# Split the data into train, val, and test sets. In addition we will# create a small development set as a subset of the training data;# we can use this for development so our code runs faster.num_training = 49000num_validation = 1000num_test = 1000num_dev = 500# Our validation set will be num_validation points from the original# training set.mask = range(num_training, num_training + num_validation)X_val = X_train[mask]y_val = y_train[mask]# Our training set will be the first num_train points from the original# training set.mask = range(num_training)X_train = X_train[mask]y_train = y_train[mask]# We will also make a development set, which is a small subset of# the training set.mask = np.random.choice(num_training, num_dev, replace=False)X_dev = X_train[mask]y_dev = y_train[mask]# We use the first num_test points of the original test set as our# test set.mask = range(num_test)X_test = X_test[mask]y_test = y_test[mask]print('Train data shape: ', X_train.shape)print('Train labels shape: ', y_train.shape)print('Validation data shape: ', X_val.shape)print('Validation labels shape: ', y_val.shape)print('Test data shape: ', X_test.shape)print('Test labels shape: ', y_test.shape)Train data shape:  (49000, 32, 32, 3)Train labels shape:  (49000,)Validation data shape:  (1000, 32, 32, 3)Validation labels shape:  (1000,)Test data shape:  (1000, 32, 32, 3)Test labels shape:  (1000,)In [5]:# Preprocessing: reshape the image data into rowsX_train = np.reshape(X_train, (X_train.shape[0], -1))X_val = np.reshape(X_val, (X_val.shape[0], -1))X_test = np.reshape(X_test, (X_test.shape[0], -1))X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))# As a sanity check, print out the shapes of the dataprint('Training data shape: ', X_train.shape)print('Validation data shape: ', X_val.shape)print('Test data shape: ', X_test.shape)print('dev data shape: ', X_dev.shape)Training data shape:  (49000, 3072)Validation data shape:  (1000, 3072)Test data shape:  (1000, 3072)dev data shape:  (500, 3072)In [6]:# Preprocessing: subtract the mean image# first: compute the image mean based on the training datamean_image = np.mean(X_train, axis=0)print(mean_image.shape)print(mean_image[:10]) # print a few of the elementsplt.figure(figsize=(4,4))plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean imageplt.show()(3072,)[ 130.64189796  135.98173469  132.47391837  130.05569388  135.34804082  131.75402041  130.96055102  136.14328571  132.47636735  131.48467347]In [7]:# second: subtract the mean image from train and test dataX_train -= mean_imageX_val -= mean_imageX_test -= mean_imageX_dev -= mean_imageIn [8]:# third: append the bias dimension of ones (i.e. bias trick) so that our SVM# only has to worry about optimizing a single weight matrix W.X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])print(X_train.shape, X_val.shape, X_test.shape, X_dev.shape)(49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)In [10]:print(y_train[:10])[6 9 9 4 1 1 2 7 8 3]SVM ClassifierYour code for this section will all be written inside cs231n/classifiers/linear_svm.py.As you can see, we have prefilled the function compute_loss_naive which uses for loops to evaluate the multiclass SVM loss function.In [12]:# Evaluate the naive implementation of the loss we provided for you:from cs231n.classifiers.linear_svm import svm_loss_naiveimport time# generate a random SVM weight matrix of small numbersW = np.random.randn(3073, 10) * 0.0001 loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.000005)print('loss: %f' % (loss, ))loss: 9.281010The grad returned from the function above is right now all zero. Derive and implement the gradient for the SVM cost function and implement it inline inside the function svm_loss_naive. You will find it helpful to interleave your new code inside the existing function.To check that you have correctly implemented the gradient correctly, you can numerically estimate the gradient of the loss function and compare the numeric estimate to the gradient that you computed. We have provided code that does this for you:In [13]:# Once you've implemented the gradient, recompute it with the code below# and gradient check it with the function we provided for you# Compute the loss and its gradient at W.loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)# Numerically compute the gradient along several randomly chosen dimensions, and# compare them with your analytically computed gradient. The numbers should match# almost exactly along all dimensions.from cs231n.gradient_check import grad_check_sparsef = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]grad_numerical = grad_check_sparse(f, W, grad)# do the gradient check once again with regularization turned on# you didn't forget the regularization gradient did you?loss, grad = svm_loss_naive(W, X_dev, y_dev, 5e1)f = lambda w: svm_loss_naive(w, X_dev, y_dev, 5e1)[0]grad_numerical = grad_check_sparse(f, W, grad)numerical: -3.077570 analytic: -3.077570, relative error: 5.855178e-11numerical: 19.424861 analytic: 19.424861, relative error: 1.185627e-11numerical: 21.409919 analytic: 21.409919, relative error: 1.075293e-11numerical: 0.525651 analytic: 0.525651, relative error: 7.267888e-10numerical: 7.224015 analytic: 7.224015, relative error: 7.660247e-11numerical: 26.974799 analytic: 26.974799, relative error: 3.183110e-12numerical: 0.522989 analytic: 0.522989, relative error: 3.118520e-10numerical: 13.834658 analytic: 13.834658, relative error: 8.990437e-12numerical: -30.948592 analytic: -30.948592, relative error: 8.753160e-12numerical: -10.865639 analytic: -10.865639, relative error: 1.331004e-12numerical: -52.351602 analytic: -52.351902, relative error: 2.858895e-06numerical: -11.092916 analytic: -11.092945, relative error: 1.302661e-06numerical: -1.250897 analytic: -1.256461, relative error: 2.219086e-03numerical: 23.045080 analytic: 23.042385, relative error: 5.846371e-05numerical: 12.563225 analytic: 12.570771, relative error: 3.002338e-04numerical: 14.221142 analytic: 14.235024, relative error: 4.878103e-04numerical: -13.668007 analytic: -13.660236, relative error: 2.843660e-04numerical: -0.722945 analytic: -0.717629, relative error: 3.690365e-03numerical: 10.409573 analytic: 10.408858, relative error: 3.437838e-05numerical: 23.297342 analytic: 23.307056, relative error: 2.084309e-04会存在有些维度的梯度不匹配，因为有些数值点处，梯度不存在### Inline Question 1:It is possible that once in a while a dimension in the gradcheck will not match exactly. What could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in one dimension where a gradient check could fail? *Hint: the SVM loss function is not strictly speaking differentiable***Your Answer:** *会存在有些维度的梯度不匹配，因为有些数值点处，梯度不存在*In [15]:# Next implement the function svm_loss_vectorized; for now only compute the loss;# we will implement the gradient in a moment.tic = time.time()loss_naive, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)toc = time.time()print('Naive loss: %e computed in %fs' % (loss_naive, toc - tic))from cs231n.classifiers.linear_svm import svm_loss_vectorizedtic = time.time()loss_vectorized, _ = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)toc = time.time()print('Vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))# The losses should match but your vectorized implementation should be much faster.print('difference: %f' % (loss_naive - loss_vectorized))Naive loss: 9.281010e+00 computed in 0.210024sVectorized loss: 9.281010e+00 computed in 0.002797sdifference: 0.000000In [16]:# Complete the implementation of svm_loss_vectorized, and compute the gradient# of the loss function in a vectorized way.# The naive implementation and the vectorized implementation should match, but# the vectorized version should still be much faster.tic = time.time()_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)toc = time.time()print('Naive loss and gradient: computed in %fs' % (toc - tic))tic = time.time()_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)toc = time.time()print('Vectorized loss and gradient: computed in %fs' % (toc - tic))# The loss is a single number, so it is easy to compare the values computed# by the two implementations. The gradient on the other hand is a matrix, so# we use the Frobenius norm to compare them.difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')print('difference: %f' % difference)Naive loss and gradient: computed in 0.156195sVectorized loss and gradient: computed in 0.002693sdifference: 0.000000Stochastic Gradient DescentWe now have vectorized and efficient expressions for the loss, the gradient and our gradient matches the numerical gradient. We are therefore ready to do SGD to minimize the loss.In [17]:# In the file linear_classifier.py, implement SGD in the function# LinearClassifier.train() and then run it with the code below.from cs231n.classifiers import LinearSVMsvm = LinearSVM()tic = time.time()loss_hist = svm.train(X_train, y_train, learning_rate=1e-7, reg=2.5e4,                      num_iters=1500, verbose=True)toc = time.time()print('That took %fs' % (toc - tic))iteration 0 / 1500: loss 409.405338iteration 100 / 1500: loss 239.975529iteration 200 / 1500: loss 146.770208iteration 300 / 1500: loss 90.697310iteration 400 / 1500: loss 56.000237iteration 500 / 1500: loss 35.658527iteration 600 / 1500: loss 24.095215iteration 700 / 1500: loss 15.804802iteration 800 / 1500: loss 12.232596iteration 900 / 1500: loss 9.217965iteration 1000 / 1500: loss 7.326305iteration 1100 / 1500: loss 5.783639iteration 1200 / 1500: loss 5.738319iteration 1300 / 1500: loss 5.721574iteration 1400 / 1500: loss 5.173164That took 3.366138sIn [18]:# A useful debugging strategy is to plot the loss as a function of# iteration number:plt.plot(loss_hist)plt.xlabel('Iteration number')plt.ylabel('Loss value')plt.show()In [19]:# Write the LinearSVM.predict function and evaluate the performance on both the# training and validation sety_train_pred = svm.predict(X_train)print('training accuracy: %f' % (np.mean(y_train == y_train_pred), ))y_val_pred = svm.predict(X_val)print('validation accuracy: %f' % (np.mean(y_val == y_val_pred), ))training accuracy: 0.380837validation accuracy: 0.375000In [24]:# Use the validation set to tune hyperparameters (regularization strength and# learning rate). You should experiment with different ranges for the learning# rates and regularization strengths; if you are careful you should be able to# get a classification accuracy of about 0.4 on the validation set.learning_rates = [1e-7]#regularization_strengths = [2.5e4, 5e4]regularization_strengths =[(j+0.1*i)*1e4 for j in range(1,5) for i in range(0,10)]   #[1*1e4，6*1e4]两数之间间隔0.1*1e4 共55个数# results is dictionary mapping tuples of the form# (learning_rate, regularization_strength) to tuples of the form# (training_accuracy, validation_accuracy). The accuracy is simply the fraction# of data points that are correctly classified.results = {}best_val = -1   # The highest validation accuracy that we have seen so far.best_svm = None # The LinearSVM object that achieved the highest validation rate.################################################################################# TODO:                                                                        ## Write code that chooses the best hyperparameters by tuning on the validation ## set. For each combination of hyperparameters, train a linear SVM on the      ## training set, compute its accuracy on the training and validation sets, and  ## store these numbers in the results dictionary. In addition, store the best   ## validation accuracy in best_val and the LinearSVM object that achieves this  ## accuracy in best_svm.                                                        ##                                                                              ## Hint: You should use a small value for num_iters as you develop your         ## validation code so that the SVMs don't take much time to train; once you are ## confident that your validation code works, you should rerun the validation   ## code with a larger value for num_iters.                                      #################################################################################for reg in regularization_strengths:    for lr in learning_rates:        svm = LinearSVM()        loss_hist = svm.train(X_train, y_train, lr, reg, num_iters=1500)        y_train_pred = svm.predict(X_train)        train_accuracy = np.mean(y_train == y_train_pred)        y_val_pred = svm.predict(X_val)        val_accuracy = np.mean(y_val == y_val_pred)        if val_accuracy > best_val:            best_val = val_accuracy            best_svm = svm                   results[(lr,reg)] = train_accuracy, val_accuracy#################################################################################                              END OF YOUR CODE                                ################################################################################## Print out results.for lr, reg in sorted(results):    train_accuracy, val_accuracy = results[(lr, reg)]    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (                lr, reg, train_accuracy, val_accuracy))print('best validation accuracy achieved during cross-validation: %f' % best_val)lr 1.000000e-07 reg 1.000000e+04 train accuracy: 0.370959 val accuracy: 0.372000lr 1.000000e-07 reg 1.100000e+04 train accuracy: 0.379327 val accuracy: 0.379000lr 1.000000e-07 reg 1.200000e+04 train accuracy: 0.381551 val accuracy: 0.392000lr 1.000000e-07 reg 1.300000e+04 train accuracy: 0.380000 val accuracy: 0.382000lr 1.000000e-07 reg 1.400000e+04 train accuracy: 0.382714 val accuracy: 0.385000lr 1.000000e-07 reg 1.500000e+04 train accuracy: 0.382694 val accuracy: 0.381000lr 1.000000e-07 reg 1.600000e+04 train accuracy: 0.380020 val accuracy: 0.399000lr 1.000000e-07 reg 1.700000e+04 train accuracy: 0.379694 val accuracy: 0.388000lr 1.000000e-07 reg 1.800000e+04 train accuracy: 0.381898 val accuracy: 0.382000lr 1.000000e-07 reg 1.900000e+04 train accuracy: 0.380163 val accuracy: 0.391000lr 1.000000e-07 reg 2.000000e+04 train accuracy: 0.385245 val accuracy: 0.398000lr 1.000000e-07 reg 2.100000e+04 train accuracy: 0.378796 val accuracy: 0.379000lr 1.000000e-07 reg 2.200000e+04 train accuracy: 0.385102 val accuracy: 0.386000lr 1.000000e-07 reg 2.300000e+04 train accuracy: 0.379306 val accuracy: 0.377000lr 1.000000e-07 reg 2.400000e+04 train accuracy: 0.383184 val accuracy: 0.399000lr 1.000000e-07 reg 2.500000e+04 train accuracy: 0.381408 val accuracy: 0.385000lr 1.000000e-07 reg 2.600000e+04 train accuracy: 0.376980 val accuracy: 0.382000lr 1.000000e-07 reg 2.700000e+04 train accuracy: 0.373612 val accuracy: 0.398000lr 1.000000e-07 reg 2.800000e+04 train accuracy: 0.377796 val accuracy: 0.380000lr 1.000000e-07 reg 2.900000e+04 train accuracy: 0.379367 val accuracy: 0.392000lr 1.000000e-07 reg 3.000000e+04 train accuracy: 0.376490 val accuracy: 0.392000lr 1.000000e-07 reg 3.100000e+04 train accuracy: 0.380776 val accuracy: 0.397000lr 1.000000e-07 reg 3.200000e+04 train accuracy: 0.381122 val accuracy: 0.389000lr 1.000000e-07 reg 3.300000e+04 train accuracy: 0.374306 val accuracy: 0.377000lr 1.000000e-07 reg 3.400000e+04 train accuracy: 0.378714 val accuracy: 0.383000lr 1.000000e-07 reg 3.500000e+04 train accuracy: 0.375755 val accuracy: 0.380000lr 1.000000e-07 reg 3.600000e+04 train accuracy: 0.375980 val accuracy: 0.386000lr 1.000000e-07 reg 3.700000e+04 train accuracy: 0.371898 val accuracy: 0.388000lr 1.000000e-07 reg 3.800000e+04 train accuracy: 0.376551 val accuracy: 0.372000lr 1.000000e-07 reg 3.900000e+04 train accuracy: 0.374061 val accuracy: 0.373000lr 1.000000e-07 reg 4.000000e+04 train accuracy: 0.372633 val accuracy: 0.381000lr 1.000000e-07 reg 4.100000e+04 train accuracy: 0.373551 val accuracy: 0.378000lr 1.000000e-07 reg 4.200000e+04 train accuracy: 0.376102 val accuracy: 0.366000lr 1.000000e-07 reg 4.300000e+04 train accuracy: 0.369592 val accuracy: 0.375000lr 1.000000e-07 reg 4.400000e+04 train accuracy: 0.371265 val accuracy: 0.364000lr 1.000000e-07 reg 4.500000e+04 train accuracy: 0.372510 val accuracy: 0.383000lr 1.000000e-07 reg 4.600000e+04 train accuracy: 0.371020 val accuracy: 0.372000lr 1.000000e-07 reg 4.700000e+04 train accuracy: 0.361388 val accuracy: 0.383000lr 1.000000e-07 reg 4.800000e+04 train accuracy: 0.368306 val accuracy: 0.395000lr 1.000000e-07 reg 4.900000e+04 train accuracy: 0.372959 val accuracy: 0.381000best validation accuracy achieved during cross-validation: 0.399000In [25]:# Visualize the cross-validation resultsimport mathx_scatter = [math.log10(x[0]) for x in results]y_scatter = [math.log10(x[1]) for x in results]# plot training accuracymarker_size = 100colors = [results[x][0] for x in results]plt.subplot(2, 1, 1)plt.scatter(x_scatter, y_scatter, marker_size, c=colors)plt.colorbar()plt.xlabel('log learning rate')plt.ylabel('log regularization strength')plt.title('CIFAR-10 training accuracy')# plot validation accuracycolors = [results[x][1] for x in results] # default size of markers is 20plt.subplot(2, 1, 2)plt.scatter(x_scatter, y_scatter, marker_size, c=colors)plt.colorbar()plt.xlabel('log learning rate')plt.ylabel('log regularization strength')plt.title('CIFAR-10 validation accuracy')plt.show()In [26]:# Evaluate the best svm on test sety_test_pred = best_svm.predict(X_test)test_accuracy = np.mean(y_test == y_test_pred)print('linear SVM on raw pixels final test set accuracy: %f' % test_accuracy)linear SVM on raw pixels final test set accuracy: 0.376000In [27]:# Visualize the learned weights for each class.# Depending on your choice of learning rate and regularization strength, these may# or may not be nice to look at.w = best_svm.W[:-1,:] # strip out the biasw = w.reshape(32, 32, 3, 10)w_min, w_max = np.min(w), np.max(w)classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']for i in range(10):    plt.subplot(2, 5, i + 1)    # Rescale the weights to be between 0 and 255    wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)    plt.imshow(wimg.astype('uint8'))    plt.axis('off')    plt.title(classes[i])Inline question 2:Describe what your visualized SVM weights look like, and offer a brief explanation for why they look they way that they do.Your answer: fill this in

/cs231n/classifiers/linear_svm.py

import numpy as npfrom random import shufflefrom past.builtins import xrangedef svm_loss_naive(W, X, y, reg):  """  Structured SVM loss function, naive implementation (with loops).  Inputs have dimension D, there are C classes, and we operate on minibatches  of N examples.  Inputs:  - W: A numpy array of shape (D, C) containing weights.  - X: A numpy array of shape (N, D) containing a minibatch of data.  - y: A numpy array of shape (N,) containing training labels; y[i] = c means    that X[i] has label c, where 0 <= c < C.  - reg: (float) regularization strength  Returns a tuple of:  - loss as single float  - gradient with respect to weights W; an array of same shape as W  """  dW = np.zeros(W.shape) # initialize the gradient as zero  # compute the loss and the gradient  num_classes = W.shape[1]  num_train = X.shape[0]  loss = 0.0  for i in xrange(num_train):    scores = X[i].dot(W)    correct_class_score = scores[y[i]]    for j in xrange(num_classes):      if j == y[i]:        continue      margin = scores[j] - correct_class_score + 1 # note delta = 1      if margin > 0:        loss += margin        dW[:,y[i]]+=-X[i,:]        dW[:,j]+=X[i,:]  # Right now the loss is a sum over all training examples, but we want it  # to be an average instead so we divide by num_train.  loss /= num_train  dW /=num_train  # Add regularization to the loss.  loss += reg * np.sum(W * W)  dW+=reg*W  #############################################################################  # TODO:                                                                     #  # Compute the gradient of the loss function and store it dW.                #  # Rather that first computing the loss and then computing the derivative,   #  # it may be simpler to compute the derivative at the same time that the     #  # loss is being computed. As a result you may need to modify some of the    #  # code above to compute the gradient.                                       #  #############################################################################  return loss, dWdef svm_loss_vectorized(W, X, y, reg):  """  Structured SVM loss function, vectorized implementation.  Inputs and outputs are the same as svm_loss_naive.  """  loss = 0.0  dW = np.zeros(W.shape) # initialize the gradient as zero  #############################################################################  # TODO:                                                                     #  # Implement a vectorized version of the structured SVM loss, storing the    #  # result in loss.                                                           #  #############################################################################  num_classes = W.shape[1]  num_train = X.shape[0]  scores=X.dot(W)  correct_class_scores = scores[range(num_train),list(y)].reshape(-1,1)  margins = np.maximum(0, scores - np.tile(correct_class_scores, (1,num_classes)) + 1)  margins[range(num_train), list(y)] = 0  loss = np.sum(margins)  loss /= num_train  # 添加正则项  loss += 0.5 * reg * np.sum(W * W)  #############################################################################  #                             END OF YOUR CODE                              #  #############################################################################  #############################################################################  # TODO:                                                                     #  # Implement a vectorized version of the gradient for the structured SVM     #  # loss, storing the result in dW.                                           #  #                                                                           #  # Hint: Instead of computing the gradient from scratch, it may be easier    #  # to reuse some of the intermediate values that you used to compute the     #  # loss.                                                                     #  #############################################################################  #计算梯度  margins[margins > 0] = 1.0  row_sum = np.sum(margins, axis=1)                    margins[np.arange(num_train), y] = -row_sum          dW += np.dot(X.T, margins)/num_train + reg * W  #############################################################################  #                             END OF YOUR CODE                              #  #############################################################################  return loss, dW

/cs231n/classifiers/linear_classifier.py

from __future__ import print_functionimport numpy as npfrom cs231n.classifiers.linear_svm import *from cs231n.classifiers.softmax import *from past.builtins import xrangeclass LinearClassifier(object):  def __init__(self):    self.W = None  def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100,            batch_size=200, verbose=False):    """    Train this linear classifier using stochastic gradient descent.    使用SGD训练线性分类器    Inputs:    输入：    - X: A numpy array of shape (N, D) containing training data; there are N      training samples each of dimension D.    -X：形状为（N，D），包含用来训练的数据    - y: A numpy array of shape (N,) containing training labels; y[i] = c      means that X[i] has label 0 <= c < C for C classes.    - learning_rate: (float) learning rate for optimization.    - reg: (float) regularization strength.    - num_iters: (integer) number of steps to take when optimizing    - batch_size: (integer) number of training examples to use at each step.    - verbose: (boolean) If true, print progress during optimization.    Outputs:    A list containing the value of the loss function at each training iteration.    """    num_train, dim = X.shape    num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes    if self.W is None:      # lazily initialize W      self.W = 0.001 * np.random.randn(dim, num_classes)    # Run stochastic gradient descent to optimize W    loss_history = []    for it in xrange(num_iters):      X_batch = None      y_batch = None      #########################################################################      # TODO:                                                                 #      # Sample batch_size elements from the training data and their           #      # corresponding labels to use in this round of gradient descent.        #      # Store the data in X_batch and their corresponding labels in           #      # y_batch; after sampling X_batch should have shape (dim, batch_size)   #      # and y_batch should have shape (batch_size,)                           #      #                                                                       #      # Hint: Use np.random.choice to generate indices. Sampling with         #      # replacement is faster than sampling without replacement.              #      #########################################################################      idx = np.random.choice(num_train, batch_size, replace=True)      X_batch = X[idx]      y_batch = y[idx]      #########################################################################      #                       END OF YOUR CODE                                #      #########################################################################      # evaluate loss and gradient      loss, grad = self.loss(X_batch, y_batch, reg)      loss_history.append(loss)      # perform parameter update      #########################################################################      # TODO:                                                                 #      # Update the weights using the gradient and the learning rate.          #      #########################################################################      self.W = self.W - learning_rate*grad  #SGD      #########################################################################      #                       END OF YOUR CODE                                #      #########################################################################      if verbose and it % 100 == 0:        print('iteration %d / %d: loss %f' % (it, num_iters, loss))    return loss_history  def predict(self, X):    """    Use the trained weights of this linear classifier to predict labels for    data points.    Inputs:    - X: A numpy array of shape (N, D) containing training data; there are N      training samples each of dimension D.    Returns:    - y_pred: Predicted labels for the data in X. y_pred is a 1-dimensional      array of length N, and each element is an integer giving the predicted      class.    """    y_pred = np.zeros(X.shape[0])    ###########################################################################    # TODO:                                                                   #    # Implement this method. Store the predicted labels in y_pred.            #    ###########################################################################    scores = X.dot(self.W)    y_pred = np.argmax(scores, axis = 1)    ###########################################################################    #                           END OF YOUR CODE                              #    ###########################################################################    return y_pred  def loss(self, X_batch, y_batch, reg):    """    Compute the loss function and its derivative.     Subclasses will override this.    Inputs:    - X_batch: A numpy array of shape (N, D) containing a minibatch of N      data points; each point has dimension D.    - y_batch: A numpy array of shape (N,) containing labels for the minibatch.    - reg: (float) regularization strength.    Returns: A tuple containing:    - loss as a single float    - gradient with respect to self.W; an array of the same shape as W    """    passclass LinearSVM(LinearClassifier):  """ A subclass that uses the Multiclass SVM loss function """  def loss(self, X_batch, y_batch, reg):    return svm_loss_vectorized(self.W, X_batch, y_batch, reg)class Softmax(LinearClassifier):  """ A subclass that uses the Softmax + Cross-entropy loss function """  def loss(self, X_batch, y_batch, reg):    return softmax_loss_vectorized(self.W, X_batch, y_batch, reg)

阅读全文

0 0