CS231n——Assignmen1之Softmax
来源:互联网 发布:js二维数组动态赋值 编辑:程序博客网 时间:2024/06/16 23:56
Softmax 分类器的实现
1.计算损失函数和梯度的实现
import numpy as npfrom random import shuffledef softmax_loss_naive(W, X, y, reg):#带循环 """ Softmax loss function, naive implementation (with loops) Inputs have dimension D, there are C classes, and we operate on minibatches of N examples. Inputs: - W: A numpy array of shape (D, C) containing weights. - X: A numpy array of shape (N, D) containing a minibatch of data. - y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label c, where 0 <= c < C. - reg: (float) regularization strength Returns a tuple of: - loss as single float - gradient with respect to weights W; an array of same shape as W """ # Initialize the loss and gradient to zero. dW = np.zeros_like(W) dW_each=np.zeros_like(W) ############################################################################# # TODO: Compute the softmax loss and its gradient using explicit loops. # # Store the loss in loss and the gradient in dW. If you are not careful # # here, it is easy to run into numeric instability. Don't forget the # # regularization! # ############################################################################# num_train=X.shape[0] num_class=W.shape[1] f=np.dot(X,W) #(N,C),评分函数 f_max=np.reshape(np.max(f,axis=1),(num_train,1)) #找到每一行的最大值,然后reshape 之后减去 #这样可以防止后面的操作会出现数值上的一些偏差 #regularization f-=f_max p = np.exp(f) / np.sum(np.exp(f),axis=1,keepdims=True) #N by C #这里要注意,除的是每个样本的和,不能全求和 #求交叉熵!!! loss=0.0 y_true=np.zeros_like(p) y_true[np.arange(num_train),y]=1.0# 生成hot-vector for i in range(num_train): for j in range(num_class): loss+=-(y_true[i,j]*np.log(p[i,j])) #损失函数公式:L = -(1/N)∑i∑j1(k=yi)log(exp(fk)/∑j exp(fj)) + λR(W) dW_each[:,j]=-(y_true[i,j]-p[i,j])*X[i,:] # ∇Wk L = -(1/N)∑i xiT(pi,m-Pm) + 2λWk, where Pk = exp(fk)/∑j exp(fj dW+=dW_each loss/=num_train loss+=0.5*reg*np.sum(W*W) #加上正则项 dW/=num_train dW+=reg*W ############################################################################# # END OF YOUR CODE # ############################################################################# return loss, dWdef softmax_loss_vectorized(W, X, y, reg):#向量化操作 """ Softmax loss function, vectorized version. Inputs and outputs are the same as softmax_loss_naive. """ # Initialize the loss and gradient to zero. loss = 0.0 dW = np.zeros_like(W)#D by C ############################################################################# # TODO: Compute the softmax loss and its gradient using no explicit loops. # # Store the loss in loss and the gradient in dW. If you are not careful # # here, it is easy to run into numeric instability. Don't forget the # # regularization! # ############################################################################# num_train = X.shape[0] num_class = W.shape[1] f = np.dot(X, W) # (N,C),评分函数 f_max = np.reshape(np.max(f, axis=1), (num_train, 1)) # 找到每一行的最大值,然后reshape 之后减去 # 这样可以防止后面的操作会出现数值上的一些偏差 # regularization f -= f_max p = np.exp(f) / np.sum(np.exp(f), axis=1, keepdims=True) # N by C #这里要注意,除的是每个样本的和,不能全求和 # 求交叉熵!! y_true=np.zeros_like(p) y_true[np.arange(num_train),y]=1.0# 生成hot-vector loss+=-np.sum(np.log(p[np.arange(num_train),y])) / num_train + 0.5* reg*np.sum(W*W) dW+=-np.dot(X.T,y_true-p) /num_train + reg* W ##求梯度的vectorized 形式 ############################################################################# # END OF YOUR CODE # ############################################################################# return loss, dW
2.导入数据 (def get_CIFAR10_Data)
cifar10_dir='cs231n//datasets'X_train,y_train,X_test,y_test=load_CIFAR10(cifar10_dir)
2.1 子采样数据
mask=range(num_train,num_train+num_validation)X_val=X_train[mask]y_val=y_train[mask]mask=range(num_train)X_train=X_train[mask]y_train=y_train[mask]mask=range(num_test)X_test=X_test[mask]y_test=y_test[mask]mask=np.random.choice(num_train,num_dev,replace=False)#从0-num_train中选num_dev个样本X_dev=X_train[mask]y_dev=y_train[mask]
2.2 改变数据维度,使每幅图片排成一行
X_train=np.reshape(X_train,(X_train.shape[0],-1))X_val=np.reshape(X_val,(X_val.shape[0],-1))X_test = np.reshape(X_test, (X_test.shape[0], -1))X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))
2.3 normalize data 减去平均像素值
注意:此处是在训练集求得一副均值图片,然后用所有的数据集都减去这个均值图片
#normalize the data: substract the mean imagemean_image=np.mean(X_train,axis=0)#按列取平均,得到9000幅图片的均值图片X_train-=mean_imageX_val-=mean_imageX_test-=mean_imageX_dev-=mean_image
2.4 加上bias
np.hstack(tup)
Stack arrays in sequence horizontally (column wise).
Take a sequence of arrays and stack them horizontally to make a single array
tup : sequence of ndarrays
All arrays must have the same shape along all but the second axis.所有维度除了第二维,都必须一样
X_train=np.hstack([X_train,np.ones((X_train.shape[0],1))])X_val=np.hstack([X_val,np.ones((X_val.shape[0],1))])X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
3.两种version对比
3.1 调用naive one(with loop)
随机给W赋值
W=np.random.randn(3073,10)*0.0001
计算loss(合理性检查!!)
loss,grad=softmax_loss_naive(W,X_dev,y_dev,0.0)
注意:此处loss应该检查一下是否接近-log(0.1)
合理性检查的原因:在使用小参数进行初始化时,确保得到的损失值与期望一致,最好先单独检查数据损失(让正则化强度为0)。例如一个跑CIFAR-10的softmax分类器一般它的初始损失值为2.302.因为初始时预计每个类别的概率是0.1,然后Softmax损失值正确分类的负对数概率为-ln(0.1)=2.302
对于Weston Watkins SVM,假设所有边界都被越过,所以损失值为9,如果没看到这些损失值,那么初始化中就可能有问题。
3.2梯度检查
from cs231n.gradient_check import grad_check_sparsef=lambda w: softmax_loss_naive(w,X_dev,y_dev,0.0)[0]grad_numerical=grad_check_sparse(f,W,grad,10)
加上正则化项,再做一遍梯度检查
loss,grad=softmax_loss_naive(W,X_dev,y_dev,1e2)f=lambda w: softmax_loss_naive(w,X_dev,y_dev,1e2)[0]grad_numerical=grad_check_sparse(f,W,grad,10)
3.3对比vectorized version的时间
tic=time.time()loss_naive,grad_naive=softmax_loss_naive(W,X_dev,y_dev,0.00001)toc=time.time()print('naive loss: %e computed in %fs'%(loss_naive,toc - tic))tic=time.time()loss_vec,grad_vec=softmax_loss_vectorized(W,X_dev,y_dev,0.00001)toc=time.time()print('vectorized loss: %e computed in %fs'%(loss_vec,toc-tic))#compare the two versions of the gradientgrad_difference=np.linalg.norm(grad_naive-grad_vec,ord='fro')print('Loss differencce: %f'%np.abs(loss_naive-loss_vec))print('Gradient difference:%f'% grad_difference)
4.用validation set 调整超参数
这里是将softmax集中到Linear_classifier这个类里,该类包括了train(),predict()和loss()而,loss()根据选用的loss function 不同而不同
所以再建一个Softmax类,继承Linear_Classifier类,然后重写loss()方法
以下是Linear_Classifier
class LinearClassifier(object): def __init__(self): self.W = None def train(self, X, y, learning_rate=1e-3, reg=1e-5, num_iters=100, batch_size=200, verbose=False): """ Train this linear classifier using stochastic gradient descent. Inputs: - X: A numpy array of shape (N, D) containing training data; there are N training samples each of dimension D. - y: A numpy array of shape (N,) containing training labels; y[i] = c means that X[i] has label 0 <= c < C for C classes. - learning_rate: (float) learning rate for optimization. - reg: (float) regularization strength. - num_iters: (integer) number of steps to take when optimizing - batch_size: (integer) number of training examples to use at each step. - verbose: (boolean) If true, print progress during optimization. Outputs: A list containing the value of the loss function at each training iteration. """ num_train, dim = X.shape num_classes = np.max(y) + 1 # assume y takes values 0...K-1 where K is number of classes if self.W is None: # lazily initialize W self.W = 0.001 * np.random.randn(dim, num_classes) # Run stochastic gradient descent to optimize W loss_history = [] for it in range(num_iters): X_batch = None y_batch = None ######################################################################### # TODO: # # Sample batch_size elements from the training data and their # # corresponding labels to use in this round of gradient descent. # # Store the data in X_batch and their corresponding labels in # # y_batch; after sampling X_batch should have shape (dim, batch_size) # # and y_batch should have shape (batch_size,) # # # # Hint: Use np.random.choice to generate indices. Sampling with # # replacement is faster than sampling without replacement. # ######################################################################### #子采样 sample_index=np.random.choice(num_train,batch_size,replace=False)#replace=False意思是说没有重复的意思 X_batch=X[sample_index] y_batch=y[sample_index] ######################################################################### # END OF YOUR CODE # ######################################################################### # evaluate loss and gradient loss, grad = self.loss(X_batch, y_batch, reg) loss_history.append(loss) # perform parameter update ######################################################################### # TODO: # # Update the weights using the gradient and the learning rate. # ######################################################################### #梯度更新 self.W+=-learning_rate*grad ######################################################################### # END OF YOUR CODE # ######################################################################### if verbose and it % 100 == 0: print ('iteration %d / %d: loss %f' % (it, num_iters, loss)) return loss_history def predict(self, X): """ Use the trained weights of this linear classifier to predict labels for data points. Inputs: - X: N x D array of training data. Each column is a D-dimensional point. Returns: - y_pred: Predicted labels for the data in X. y_pred is a 1-dimensional array of length N, and each element is an integer giving the predicted class. """ y_pred = np.zeros(X.shape[1]) ########################################################################### # TODO: # # Implement this method. Store the predicted labels in y_pred. # ########################################################################### y_pred=np.dot(X,self.W)# N by C y_pred=np.argmax(y_pred,axis=1)# N, ########################################################################### # END OF YOUR CODE # ########################################################################### return y_pred def loss(self, X_batch, y_batch, reg): """ Compute the loss function and its derivative. Subclasses will override this. Inputs: - X_batch: A numpy array of shape (N, D) containing a minibatch of N data points; each point has dimension D. - y_batch: A numpy array of shape (N,) containing labels for the minibatch. - reg: (float) regularization strength. Returns: A tuple containing: - loss as a single float - gradient with respect to self.W; an array of the same shape as W """ pass
class Softmax(LinearClassifier): """ A subclass that uses the Softmax + Cross-entropy loss function """ def loss(self, X_batch, y_batch, reg): return softmax_loss_vectorized(self.W, X_batch, y_batch, reg)
from cs231n.classifiers import Softmaxresult={}best_val=-1best_softmax=Nonelearning_rate=[5e-6,1e-7,5e-7]reg=[1e4,5e4,1e8]#################################################for each_rate in learning_rate: for each_reg in reg: softmax=Softmax() loss_hist=softmax.train(X_train,y_train,learning_rate=each_rate,reg=each_reg,num_iters=700,verbose=True) y_train_pred=softmax.predict(X_train) accuracy_train=np.mean(y_train==y_train_pred) y_val_pred=softmax.predict(X_val) accuracy_val=np.mean(y_val==y_val_pred) result[each_rate,each_reg]=(accuracy_train,accuracy_val) if(best_val<accuracy_val): best_val=accuracy_val best_softmax=softmax####################################################for lr,reg in sorted(result): train_accuracy, val_accuracy=result[(lr,reg)] print('lr %e reg %e train accuracy: %f val accuracy: %f'%(lr,reg,train_accuracy,val_accuracy))print("best validation accuracy achieved druring cross-validation: %f" % best_val)
- CS231n——Assignmen1之Softmax
- cs231n:assignment1——Q3: Implement a Softmax classifier
- cs231n作业一之实现softmax
- CS231n 学习笔记(2)——神经网络 part2 :Softmax classifier
- CS231n Putting it together: Minimal Neural Network Case Study —— softmax
- CS231n Putting it together: Minimal Neural Network Case Study —— nn with softmax loss
- cs231n-svm和softmax
- cs231n assignment1--Softmax
- cs231n作业1--softmax
- cs231n - assignment1 - softmax 梯度推导
- cs231n的第一次作业Softmax
- CS231n-Assignment1(作业1)-softmax
- CS231n——RNN
- [CS231n@Stanford] Assignment1-Q3 (python) Softmax实现
- cs231n课程作业assignment1(Softmax)
- cs231n assignment(1.3):softmax分类器
- MachineLearning—Softmax Regression
- 机器学习——线性模型之softmax回归
- mysql视图和存储过程定义者修改脚本(懒人专用)
- Ubuntu nodejs
- C/C++获取Windows系统CPU和内存及硬盘使用情况
- WebRTC中丢包重传NACK实现分析【转】
- Python3 (入门3) 函数
- CS231n——Assignmen1之Softmax
- Android 启动Activity时候阻止EditText获取焦点
- 基于Ubuntu的Hadoop集群安装与配置
- protobuf序列化/反序列化性能及问题
- 使用Log4j将程序日志实时写入Kafka
- Android之GPS定位详解
- CAP的分区容错性
- Sql top 使用
- Nginx+Tomcat+memcached负载均衡实现session共享