KNN 图像分类python实现
来源:互联网 发布:淘宝索尼z2手机壳 编辑:程序博客网 时间:2024/05/17 12:23
KNN思路相对比较简单,主要包括训练过程和分类过程。在训练过程上,只需要将训练集存储起来就可以。在分类过程中,将测试集和训练集中的每一张图片去比较,选取差别最小的那张图片就可以了。
数据集使用的是CIFAR-10,下载链接http://www.cs.toronto.edu/~kriz/cifar.html
学习心得:如果数据集多,就把训练集分成两部分,一小部分作为验证集(假的测试集),剩下的都为训练集(一般来说是70%-90%,具体多少取决于需要调整的超参数的多少,如果超参数多,验证集占比就更大一点)。验证集的好处是用来调节超参数,为什么不直接用测试集呢?这样容易过拟合,泛化能力差。如果数据集不多,就使用交叉验证的方法来调节参数。但是交叉验证的代价比较高,如果可以支付的起计算代价,使用交叉验证当然是更好的。而且K折交叉验证,K越大越好,但是代价也更高。
kNN缺点:
1. 分类器必须存储所有的训练集数据,用来和未来的测试数据集比较。
2. 要将测试集和所有的训练集进行比较,因此代价很高。
由于图像的维度一般都很高,所以一般不使用KNN,因为计算距离的代价很高。作为新手熟悉kNN的过程,实现kNN还是很好的。
具体实现过程如下:
import numpy as npclass kNearestNeighbor: def __init__(self): pass def train(self, X, y): self.Xtr = X self.ytr = y def predict(self, X, k=1): num_test = X.shape[0] Ypred = np.zeros(num_test, dtype = self.ytr.dtype) for i in range(num_test): distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1) closest_y = y_train[np.argsort(distances)[:k]] u, indices = np.unique(closest_y, return_inverse=True) Ypred[i] = u[np.argmax(np.bincount(indices))] return Ypred
load_CIFAR_batch()和load_CIFAR10()是用来加载CIFAR-10数据集的
import pickledef load_CIFAR_batch(filename): """ load single batch of cifar """ with open(filename, 'rb') as f: datadict = pickle.load(f, encoding='latin1') X = datadict['data'] Y = datadict['labels'] X = X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float") Y = np.array(Y) return X, Y
import osdef load_CIFAR10(ROOT): """ load all of cifar """ xs = [] ys = [] for b in range(1,6): f = os.path.join(ROOT, 'data_batch_%d' %(b)) X, Y = load_CIFAR_batch(f) xs.append(X) ys.append(Y) Xtr = np.concatenate(xs) #使变成行向量 Ytr = np.concatenate(ys) del X,Y Xte, Yte = load_CIFAR_batch(os.path.join(ROOT, 'test_batch')) return Xtr, Ytr, Xte, Yte
Xtr, Ytr, Xte, Yte = load_CIFAR10('cifar10')Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3)Xte_rows = Xte.reshape(Xte.shape[0], 32 * 32 * 3)
#由于数据集稍微有点大,在电脑上跑的很慢,所以取训练集5000个,测试集500个num_training = 5000num_test = 500x_train = Xtr_rows[:num_training, :]y_train = Ytr[:num_training]x_test = Xte_rows[:num_test, :]y_test = Yte[:num_test]
knn = kNearestNeighbor()knn.train(x_train, y_train)y_predict = knn.predict(x_test, k=7)acc = np.mean(y_predict == y_test)print('accuracy : %f' %(acc))
accuracy : 0.302000
#k值取什么最后的效果会更好呢?可以使用交叉验证的方法,这里使用的是5折交叉验证num_folds = 5k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]x_train_folds = np.array_split(x_train, num_folds)y_train_folds = np.array_split(y_train, num_folds)k_to_accuracies = {}for k_val in k_choices: print('k = ' + str(k_val)) k_to_accuracies[k_val] = [] for i in range(num_folds): x_train_cycle = np.concatenate([f for j,f in enumerate (x_train_folds) if j!=i]) y_train_cycle = np.concatenate([f for j,f in enumerate (y_train_folds) if j!=i]) x_val_cycle = x_train_folds[i] y_val_cycle = y_train_folds[i] knn = kNearestNeighbor() knn.train(x_train_cycle, y_train_cycle) y_val_pred = knn.predict(x_val_cycle, k_val) num_correct = np.sum(y_val_cycle == y_val_pred) k_to_accuracies[k_val].append(float(num_correct) / float(len(y_val_cycle)))
k = 1k = 3k = 5k = 8k = 10k = 12k = 15k = 20k = 50k = 100
for k in sorted(k_to_accuracies): for accuracy in k_to_accuracies[k]: print('k = %d, accuracy = %f' % (int(k), accuracy))
k = 1, accuracy = 0.098000k = 1, accuracy = 0.148000k = 1, accuracy = 0.205000k = 1, accuracy = 0.233000k = 1, accuracy = 0.308000k = 3, accuracy = 0.089000k = 3, accuracy = 0.142000k = 3, accuracy = 0.215000k = 3, accuracy = 0.251000k = 3, accuracy = 0.296000k = 5, accuracy = 0.096000k = 5, accuracy = 0.176000k = 5, accuracy = 0.240000k = 5, accuracy = 0.284000k = 5, accuracy = 0.309000k = 8, accuracy = 0.100000k = 8, accuracy = 0.175000k = 8, accuracy = 0.263000k = 8, accuracy = 0.289000k = 8, accuracy = 0.310000k = 10, accuracy = 0.099000k = 10, accuracy = 0.174000k = 10, accuracy = 0.264000k = 10, accuracy = 0.318000k = 10, accuracy = 0.313000k = 12, accuracy = 0.100000k = 12, accuracy = 0.192000k = 12, accuracy = 0.261000k = 12, accuracy = 0.316000k = 12, accuracy = 0.318000k = 15, accuracy = 0.087000k = 15, accuracy = 0.197000k = 15, accuracy = 0.255000k = 15, accuracy = 0.322000k = 15, accuracy = 0.321000k = 20, accuracy = 0.089000k = 20, accuracy = 0.225000k = 20, accuracy = 0.270000k = 20, accuracy = 0.319000k = 20, accuracy = 0.306000k = 50, accuracy = 0.079000k = 50, accuracy = 0.248000k = 50, accuracy = 0.278000k = 50, accuracy = 0.287000k = 50, accuracy = 0.293000k = 100, accuracy = 0.075000k = 100, accuracy = 0.246000k = 100, accuracy = 0.275000k = 100, accuracy = 0.284000k = 100, accuracy = 0.277000
可视化交叉验证的结果
import matplotlib.pyplot as pltplt.rcParams['figure.figsize'] = (10.0, 8.0)plt.rcParams['image.interpolation'] = 'nearest'plt.rcParams['image.cmap'] = 'gray'
for k in k_choices: accuracies = k_to_accuracies[k] plt.scatter([k] * len(accuracies), accuracies)accuracies_mean = np.array([np.mean(v) for k,v in sorted(k_to_accuracies.items())])accuracies_std = np.array([np.std(v) for k,v in sorted(k_to_accuracies.items())])plt.errorbar(k_choices, accuracies_mean, yerr=accuracies_std)plt.title('Cross-validation on k')plt.xlabel('k')plt.ylabel('Cross-validation accuracy')plt.show()
结果如图:
阅读全文
0 0
- KNN 图像分类python实现
- Python实现kNN分类算法
- kNN分类算法python实现
- KNN分类算法Python实现
- 利用python实现KNN分类器
- 用python实现kNN分类算法
- Python实现的KNN分类器
- python实现特征向量的knn分类
- Python实现KNN算法项目 --- 水果分类
- 图像分类与KNN
- 图像分类器(KNN)
- 图像分类与KNN
- kNN分类原理以及python实现手写数字分类
- Python KNN 情感分类
- KNN最邻近规则分类算法实践实现【Python实现】
- JAVA实现KNN分类
- 【python 编程】文本分类KNN算法实现及结果输出
- python 实现 knn分类算法 (Iris 数据集)
- python 操作mysql数据中fetchone()和fetchall()
- bzoj1026 [SCOI2009]windy数(数位dp)
- Qt应用程序“xxx.exe 已停止工作 故障模块msvcr120.dll”
- 不用额外变量交换两个整数值 Python版
- [C/C++]堆栈的概念与区别
- KNN 图像分类python实现
- ubuntu PyTorch 安装
- Chapter 13 Greenplum PostGIS Extension
- The summary of the sixteenth week of the first term of master's degree
- 求最长回文序列(pat 1040. Longest Symmetric String)
- 198. House Robber
- 初识C++
- 内核模式下从键盘数据端口直接取按键
- OLAP(On-line Analytical Processing,联机分析处理)