[CS231N课程笔记] Lecture 2. Image Classification

来源：互联网发布：java写界面应用编辑：程序博客网时间：2024/06/05 11:11

课程note：http://cs231n.github.io/classification/

这一节课程主要介绍以下几个部分：图像识别的基本概念、最近邻分类器、验证集/交叉验证集。

Image Classification

关于图像识别的基本概念，学过图像处理课程的同学们，基本上都知道了。

我这里重点记录一下，图像识别中面临的几个挑战：
1. Viewpoint variation. A single instance of an object can be oriented in many ways with respect to the camera.
2. Scale variation. Visual classes often exhibit variation in their size (size in the real world, not only in terms of their extent in the image).
3. Deformation. Many objects of interest are not rigid bodies and can be deformed in extreme ways.
4. Occlusion. The objects of interest can be occluded. Sometimes only a small portion of an object (as little as few pixels) could be visible.
5. Illumination conditions. The effects of illumination are drastic on the pixel level.
6. Background clutter. The objects of interest may blend into their environment, making them hard to identify.
7. Intra-class variation. The classes of interest can often be relatively broad, such as chair. There are many different types of these objects, each with their own appearance.

这里写图片描述

Nearest Neighbor Classifier

import osimport numpy as npimport cPickleclass NearestNeighbor(object):    def __init__(self):        pass    def train(self, X, y):        self.Xtr = X        self.ytr = y    def predict(self, X):        num_test = X.shape[0]        Ypred = np.zeros(num_test, dtype=self.ytr.dtype)        for i in xrange(num_test):            distances = np.sum(np.abs(self.Xtr.astype(int) - X[i, :].astype(int)), axis=1)            min_index = np.argmin(distances)            Ypred[i] = self.ytr[min_index]        return Ypreddef load_CIFAR10(file_path):    file_names = os.listdir(file_path)    train_list = list()    train_data = list()    train_label = list()    for fname in file_names:        if fname.startswith('data_batch'):            full_path = os.path.join(file_path, fname)            fo = open(full_path, 'rb')            train_dict = cPickle.load(fo)            train_data.extend(train_dict['data'])            train_label.extend(train_dict['labels'])            fo.close()        if fname.startswith('test_batch'):            full_path = os.path.join(file_path, fname)            fo = open(full_path, 'rb')            test_dict = cPickle.load(fo)            test_data = test_dict['data']            test_label = test_dict['labels']            fo.close()    return np.array(train_data), np.array(train_label), test_data, np.array(test_label)if __name__ == "__main__":    Xtr, Ytr, Xte, Yte = load_CIFAR10('/home/jeremy/Data/cifar-10-batches-py/')    Xtr_rows = Xtr.reshape(Xtr.shape[0], 32*32*3)    Xte_rows = Xte.reshape(Xte.shape[0], 32*32*3)    nn = NearestNeighbor()    nn.train(Xtr_rows, Ytr)    Yte_predict = nn.predict(Xte_rows)    print "accuracy: %f" % (np.mean(Yte_predict==Yte))

L1 vs. L2.
It is interesting to consider differences between the two metrics. In particular, the L2 distance is much more unforgiving than the L1 distance when it comes to differences between two vectors. That is, the L2 distance prefers many medium disagreements to one big one. L1 and L2 distances (or equivalently the L1/L2 norms of the differences between a pair of images) are the most commonly used special cases of a p-norm.

Validation sets for Hyperparameter tuning

上面的算法设计中，我们是利用Nearest Neighbor分类器中最相似的label作为结果，不过，这里，我们其实还有一个改进方法：那就是考虑前k个最相似的结果的labels，然后利用这些labels进行投票得到最佳的结果。这是一个自然的改进方法，不过由此我们面临了一个问题，那就是k的取值我们要怎么设定？？

其实，不仅仅是k的值需要设定，我们该选择哪种距离度量方式（L1 or L2）也是需要我们设定的。

我们统一将这些需要设定的参数称为： Hyperparameters

These choices are called hyperparameters and they come up very often in the design of many Machine Learning algorithms that learn from data. It’s often not obvious what values/settings one should choose.

You might be tempted to suggest that we should try out many different values and see what works best. That is a fine idea and that’s indeed what we will do, but this must be done very carefully. In particular, we cannot use the test set for the purpose of tweaking hyperparameters. Whenever you’re designing Machine Learning algorithms, you should think of the test set as a very precious resource that should ideally never be touched until one time at the very end.

if you only use the test set once at end, it remains a good proxy for measuring the generalization of your classifier (we will see much more discussion surrounding generalization later in the class).

Evaluate on the test set only a single time, at the very end.

虽然我们不能使用Test set，但是我们可以使用一个替代的fake test dataset，那就是Validation set。这个Validation set由原始的Train set
的一小部分组成：
Original Train set = Train set + Validation set

In practice. In practice, people prefer to avoid cross-validation in favor of having a single validation split, since cross-validation can be computationally expensive. The splits people tend to use is between 50%-90% of the training data for training and rest for validation. However, this depends on multiple factors: For example if the number of hyperparameters is large you may prefer to use bigger validation splits. If the number of examples in the validation set is small (perhaps only a few hundred or so), it is safer to use cross-validation. Typical number of folds you can see in practice would be 3-fold, 5-fold or 10-fold cross-validation.

Summary

In summary:

We introduced the problem of Image Classification, in which we are given a set of images that are all labeled with a single category. We are then asked to predict these categories for a novel set of test images and measure the accuracy of the predictions.
We introduced a simple classifier called the Nearest Neighbor classifier. We saw that there are multiple hyper-parameters (such as value of k, or the type of distance used to compare examples) that are associated with this classifier and that there was no obvious way of choosing them.
We saw that the correct way to set these hyperparameters is to split your training data into two: a training set and a fake test set, which we call validation set. We try different hyperparameter values and keep the values that lead to the best performance on the validation set.
If the lack of training data is a concern, we discussed a procedure called cross-validation, which can help reduce noise in estimating which hyperparameters work best.
Once the best hyperparameters are found, we fix them and perform a single evaluation on the actual test set.
We saw that Nearest Neighbor can get us about 40% accuracy on CIFAR-10. It is simple to implement but requires us to store the entire training set and it is expensive to evaluate on a test image.
Finally, we saw that the use of L1 or L2 distances on raw pixel values is not adequate since the distances correlate more strongly with backgrounds and color distributions of images than with their semantic content.

1 0