CS231n——Assignment1--2-layer-network
来源:互联网 发布:吾生吾生有涯而知无涯 编辑:程序博客网 时间:2024/06/05 19:31
仅用Python实现一个两层的全连接网络
我认为此任务的难点在于反向传播那块的代码,以前对反向传播认识比较浅显,亲手推算后,才真正理解了反向传播,我已把反向传播这块做了补充,详情见
http://blog.csdn.net/margretwg/article/details/64920405(softmax梯度推导)
http://blog.csdn.net/margretwg/article/details/66974869(反向传播)
【2层神经网络类】
import numpy as npimport matplotlib.pyplot as pltclass TwoLayerNet(object): """ A two-layer fully-connected neural network. The net has an input dimension of N, a hidden layer dimension of H, and performs classification over C classes. We train the network with a softmax loss function and L2 regularization on the weight matrices. The network uses a ReLU nonlinearity after the first fully connected layer. In other words, the network has the following architecture: input - fully connected layer - ReLU - fully connected layer - softmax The outputs of the second fully-connected layer are the scores for each class. """ def __init__(self, input_size, hidden_size, output_size, std=1e-4): """ Initialize the model. Weights are initialized to small random values and biases are initialized to zero. Weights and biases are stored in the variable self.params, which is a dictionary with the following keys: W1: First layer weights; has shape (D, H) b1: First layer biases; has shape (H,) W2: Second layer weights; has shape (H, C) b2: Second layer biases; has shape (C,) Inputs: - input_size: The dimension D of the input data. - hidden_size: The number of neurons H in the hidden layer. - output_size: The number of classes C. """ self.params = {} self.params['W1'] = std * np.random.randn(input_size, hidden_size) self.params['b1'] = np.zeros(hidden_size) self.params['W2'] = std * np.random.randn(hidden_size, output_size) self.params['b2'] = np.zeros(output_size) def loss(self, X, y=None, reg=0.0): """ Compute the loss and gradients for a two layer fully connected neural network. Inputs: - X: Input data of shape (N, D). Each X[i] is a training sample. - y: Vector of training labels. y[i] is the label for X[i], and each y[i] is an integer in the range 0 <= y[i] < C. This parameter is optional; if it is not passed then we only return scores, and if it is passed then we instead return the loss and gradients. - reg: Regularization strength. Returns: If y is None, return a matrix scores of shape (N, C) where scores[i, c] is the score for class c on input X[i]. If y is not None, instead return a tuple of: - loss: Loss (data loss and regularization loss) for this batch of training samples. - grads: Dictionary mapping parameter names to gradients of those parameters with respect to the loss function; has the same keys as self.params. """ # Unpack variables from the params dictionary W1, b1 = self.params['W1'], self.params['b1'] W2, b2 = self.params['W2'], self.params['b2'] N, D = X.shape # Compute the forward pass scores = None loss=0.0 ############################################################################# # TODO: Perform the forward pass, computing the class scores for the input. # # Store the result in the scores variable, which should be an array of # # shape (N, C). # ############################################################################# fc1=np.dot(X,W1)+b1 #N by H fc1_act=np.maximum(0,fc1)#relu fc2=np.dot(fc1_act,W2)+b2 #N by C scores=fc2 ############################################################################# # END OF YOUR CODE # ############################################################################# # If the targets are not given then jump out, we're done if y is None: return scores # Compute the loss loss = None ############################################################################# # TODO: Finish the forward pass, and compute the loss. This should include # # both the data loss and L2 regularization for W1 and W2. Store the result # # in the variable loss, which should be a scalar. Use the Softmax # # classifier loss. So that your results match ours, multiply the # # regularization loss by 0.5 # ############################################################################# f_max = np.reshape(np.max(scores, axis=1), (N, 1)) # 找到每一行的最大值,然后reshape 之后减去 # 这样可以防止后面的操作会出现数值上的一些偏差 # regularization scores -= f_max #N BY C p = np.exp(scores) / np.sum(np.exp(scores), axis=1, keepdims=True) # N by C #这里要注意,除的是每个样本的和,不能全求和 # 求交叉熵!! y_true = np.zeros_like(p) y_true[np.arange(fc2.shape[0]), y] = 1.0 # 生成hot-vector loss = np.sum(-np.log(p[np.arange(N), y])) / N + 0.5 * reg * np.sum(W2*W2)+0.5 * reg * np.sum(W1*W1) ############################################################################# # END OF YOUR CODE # ############################################################################# # Backward pass: compute gradients grads = {} ############################################################################# # TODO: Compute the backward pass, computing the derivatives of the weights # # and biases. Store the results in the grads dictionary. For example, # # grads['W1'] should store the gradient on W1, and be a matrix of same size # ############################################################################# #反向传播 #先算scores的梯度 dscores=p dscores[np.arange(N),y]-=1# 损失函数对fc2(xW+b),fc2的导数 dscores/=N #N by C #Backprop into W2 and b2 dW2=np.dot(fc1_act.T,dscores) #H by C db2=np.sum(dscores,axis=0,keepdims=True)# (1,C) #Backprop into hidden layer drelu=np.dot(dscores,W2.T) #(N,H) dfc1=drelu dfc1[fc1<0]=0 #(N,H) #Backprop into W1 and b1 db1=np.sum(dfc1,axis=0,keepdims=True) #(1,H) dW1=np.dot(X.T,dfc1) #(D,H) #Add regularization gradient contribution dW2+=reg*W2 dW1+=reg*W1 grads['W1']=dW1 grads['W2']=dW2 grads['b1']=db1 grads['b2']=db2 ############################################################################# # END OF YOUR CODE # ############################################################################# return loss, grads def train(self, X, y, X_val, y_val, learning_rate=1e-3, learning_rate_decay=0.95, reg=1e-5,mu=0.9,num_epochs=30, num_iters=100, batch_size=200, verbose=False): """ Train this neural network using stochastic gradient descent. Inputs: - X: A numpy array of shape (N, D) giving training data. - y: A numpy array f shape (N,) giving training labels; y[i] = c means that X[i] has label c, where 0 <= c < C. - X_val: A numpy array of shape (N_val, D) giving validation data. - y_val: A numpy array of shape (N_val,) giving validation labels. - learning_rate: Scalar giving learning rate for optimization. - learning_rate_decay: Scalar giving factor used to decay the learning rate after each epoch. - reg: Scalar giving regularization strength. - num_iters: Number of steps to take when optimizing. - batch_size: Number of training examples to use per step. - verbose: boolean; if true print progress during optimization. """ num_train = X.shape[0] iterations_per_epoch = int(max(num_train / batch_size, 1)) # Use SGD to optimize the parameters in self.model loss_history = [] train_acc_history = [] val_acc_history = [] for it in range(1,num_epochs*iterations_per_epoch+1): X_batch = None y_batch = None ######################################################################### # TODO: Create a random minibatch of training data and labels, storing # # them in X_batch and y_batch respectively. # ######################################################################### choice=np.random.choice(num_train,batch_size,replace=True) #Sampling with relacement is faster than sampling without replacement #有重复比没重复更快 X_batch=X[choice] y_batch=y[choice] ######################################################################### # END OF YOUR CODE # ######################################################################### # Compute loss and gradients using the current minibatch loss, grads = self.loss(X_batch, y=y_batch, reg=reg)# 得到的grad是个dict类型 loss_history.append(loss) v_W2,v_b2,v_W1,v_b1=0.0,0.0,0.0,0.0 ######################################################################### # TODO: Use the gradients in the grads dictionary to update the # # parameters of the network (stored in the dictionary self.params) # # using stochastic gradient descent. You'll need to use the gradients # # stored in the grads dictionary defined above. # ######################################################################## #SGD '''for each_param in self.params: self.params[each_param] += -learning_rate*grads[each_param]''' #update with momentum v_W2=mu*v_W2-learning_rate*grads['W2'] self.params['W2']+=v_W2 v_W1 = mu * v_W1 - learning_rate * grads['W1'] self.params['W1']+=v_W1 v_b1 = mu * v_b1 - learning_rate * grads['b1'] self.params['b1']+=np.squeeze(v_b1) v_b2 = mu * v_b2 - learning_rate * grads['b2'] self.params['b2'] +=np.squeeze(v_b2) ######################################################################### # END OF YOUR CODE # ######################################################################### '''if verbose and it % 100 == 0: print ('iteration %d / %d: loss %f' % (it, num_iters, loss))''' # Every epoch, check train and val accuracy and decay learning rate. if it % iterations_per_epoch == 0: # Check accuracy epoch=it/iterations_per_epoch train_acc = (self.predict(X_batch) == y_batch).mean() val_acc = (self.predict(X_val) == y_val).mean() train_acc_history.append(train_acc) val_acc_history.append(val_acc) print('epoch %d/%d :loss %f, train_acc:%f, val_acc:%f' %(epoch,num_epochs,loss,train_acc,val_acc)) # Decay learning rate learning_rate *= learning_rate_decay return { 'loss_history': loss_history, 'train_acc_history': train_acc_history, 'val_acc_history': val_acc_history, } def predict(self, X): """ Use the trained weights of this two-layer network to predict labels for data points. For each data point we predict scores for each of the C classes, and assign each data point to the class with the highest score. Inputs: - X: A numpy array of shape (N, D) giving N D-dimensional data points to classify. Returns: - y_pred: A numpy array of shape (N,) giving predicted labels for each of the elements of X. For all i, y_pred[i] = c means that X[i] is predicted to have class c, where 0 <= c < C. """ y_pred = None ########################################################################### # TODO: Implement this function; it should be VERY simple! # ########################################################################### score=self.loss(X) y_pred=np.argmax(score,axis=1) ########################################################################### # END OF YOUR CODE # ########################################################################### return y_pred
一、创建一个小网络用于检查
【补1】np.random.seed(0)
相当于设定一个随机的状态,当这个状态再次被调用时,随机数是一样的
#创建一个小网络来用作检查#Note that we set the random seed for repeatable experimentsinput_size=4hidden_size=10num_classes=3num_inputs=5def init_toy_model(): np.random.seed(0) return TwoLayerNet(input_size,hidden_size,num_classesstd=1e-1)def init_toy_data(): np.random.seed(1) X=10*np.random.randn(num_inputs,input_size)#随机生成一个5*4的矩阵 y=np.array([0,1,2,2,1]) return X,ynet=init_toy_model()X,y=init_toy_data()
前 向 传 播
scores=net.loss(X)#输入参数中没有y,则得到的是每个样本对应每个类别的分数print('Your scores:')print (scores)print("correct scores:")correct_scores=np.array([ [-0.81233741, -1.27654624, -0.70335995], [-0.17129677, -1.18803311, -0.47310444], [-0.51590475, -1.01354314, -0.8504215 ], [-0.15419291, -0.48629638, -0.52901952], [-0.00618733, -0.12435261, -0.15226949]])print (correct_scores)print('Different between your scores and correct scores:')print(np.sum(np.abs(scores-correct_scores)))#算lossloss,grad=net.loss(X,y,reg=0.1)correct_loss=1.30378789133print('Difference between your loss and correct loss:')print(np.sum(np.abs(loss-correct_loss)))
梯 度 检 查
from cs231n.gradient_check import eval_numerical_gradientfor param_name in grad: f=lambda W: net.loss(X,y,reg=0.1)[0] param_grad_num=eval_numerical_gradient(f,net.params[param_name],verbose=False) print('%s max relative error: %e'%(param_name,rel_error(param_grad_num,grad[param_name])))
训练(向量法更新参数)
net=init_toy_model()stats=net.train(X,y,X,y,learning_rate=1e-1,reg=1e-5,num_iters=100,verbose=False)#返回的是个dictprint('Final training loss:',stats['loss_history'][-1])#plot the loss historyplt.plot(stats['loss_history'])plt.xlabel('iteration')plt.ylabel('training loss')plt.title('Training loss history')plt.show()
该模块输出结果为:
Your scores:[[-0.81233741 -1.27654624 -0.70335995] [-0.17129677 -1.18803311 -0.47310444] [-0.51590475 -1.01354314 -0.8504215 ] [-0.15419291 -0.48629638 -0.52901952] [-0.00618733 -0.12435261 -0.15226949]]correct scores:[[-0.81233741 -1.27654624 -0.70335995] [-0.17129677 -1.18803311 -0.47310444] [-0.51590475 -1.01354314 -0.8504215 ] [-0.15419291 -0.48629638 -0.52901952] [-0.00618733 -0.12435261 -0.15226949]]Different between your scores and correct scores:3.68027204961e-08Difference between your loss and correct loss:1.79856129989e-13b2 max relative error: 3.865091e-11W2 max relative error: 3.440708e-09b1 max relative error: 1.555470e-09W1 max relative error: 3.561318e-09epoch 1/30 :loss 1.241994, train_acc:0.660000, val_acc:0.600000epoch 2/30 :loss 0.911191, train_acc:1.000000, val_acc:1.000000epoch 3/30 :loss 0.727970, train_acc:1.000000, val_acc:1.000000epoch 4/30 :loss 0.587932, train_acc:1.000000, val_acc:1.000000epoch 5/30 :loss 0.443400, train_acc:1.000000, val_acc:1.000000epoch 6/30 :loss 0.321233, train_acc:1.000000, val_acc:1.000000epoch 7/30 :loss 0.229577, train_acc:1.000000, val_acc:1.000000epoch 8/30 :loss 0.191958, train_acc:1.000000, val_acc:1.000000epoch 9/30 :loss 0.142653, train_acc:1.000000, val_acc:1.000000epoch 10/30 :loss 0.122731, train_acc:1.000000, val_acc:1.000000epoch 11/30 :loss 0.092965, train_acc:1.000000, val_acc:1.000000epoch 12/30 :loss 0.077945, train_acc:1.000000, val_acc:1.000000epoch 13/30 :loss 0.072626, train_acc:1.000000, val_acc:1.000000epoch 14/30 :loss 0.065361, train_acc:1.000000, val_acc:1.000000epoch 15/30 :loss 0.054620, train_acc:1.000000, val_acc:1.000000epoch 16/30 :loss 0.045523, train_acc:1.000000, val_acc:1.000000epoch 17/30 :loss 0.047018, train_acc:1.000000, val_acc:1.000000epoch 18/30 :loss 0.042983, train_acc:1.000000, val_acc:1.000000epoch 19/30 :loss 0.037004, train_acc:1.000000, val_acc:1.000000epoch 20/30 :loss 0.036127, train_acc:1.000000, val_acc:1.000000epoch 21/30 :loss 0.036055, train_acc:1.000000, val_acc:1.000000epoch 22/30 :loss 0.032943, train_acc:1.000000, val_acc:1.000000epoch 23/30 :loss 0.030061, train_acc:1.000000, val_acc:1.000000epoch 24/30 :loss 0.031595, train_acc:1.000000, val_acc:1.000000epoch 25/30 :loss 0.028289, train_acc:1.000000, val_acc:1.000000epoch 26/30 :loss 0.029215, train_acc:1.000000, val_acc:1.000000epoch 27/30 :loss 0.024275, train_acc:1.000000, val_acc:1.000000epoch 28/30 :loss 0.026362, train_acc:1.000000, val_acc:1.000000epoch 29/30 :loss 0.025849, train_acc:1.000000, val_acc:1.000000epoch 30/30 :loss 0.024548, train_acc:1.000000, val_acc:1.000000Final training loss: 0.0245481639812
二、正式训练
1.读入数据
from cs231n.data_utils import load_CIFAR10def get_CIFAR10_data(num_training=9000,num_validation=1000,num_test=1000): cifar10_dir='cs231n//datasets' X_train,y_train,X_test,y_test=load_CIFAR10(cifar10_dir) #subsample mask = range(num_training, num_training + num_validation) X_val = X_train[mask] y_val = y_train[mask] mask = range(num_training) X_train = X_train[mask] y_train = y_train[mask] mask = range(num_test) X_test = X_test[mask] y_test = y_test[mask] # Normalize the data: subtract the mean image mean_image = np.mean(X_train, axis=0) X_train -= mean_image X_val -= mean_image X_test -= mean_image # Reshape data to rows X_train = X_train.reshape(num_training, -1) X_val = X_val.reshape(num_validation, -1) X_test = X_test.reshape(num_test, -1) return X_train, y_train, X_val, y_val, X_test, y_test# Invoke the above function to get our data.X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()print ('Train data shape: ', X_train.shape)print ('Train labels shape: ', y_train.shape)print ('Validation data shape: ', X_val.shape)print ('Validation labels shape: ', y_val.shape)print ('Test data shape: ', X_test.shape)print ('Test labels shape: ', y_test.shape)
2.训练网络
input_size=32*32*3hidden_size=50num_classes=50net=TwoLayerNet(input_size,hidden_size,num_classes)#train the modelstats=net.train(X_train,y_train,X_val,y_val,num_iters=1000,batch_size=200,learning_rate=1e-4,learning_rate_decay=0.95,reg=0.5,verbose=True)#predictval_acc=(net.predict(X_val)==y_val).mean()print('Validation accuracy: ',val_acc)#得到0.278不是很好,所以我们要plot loss funtion and accuracies on the training and validation set during optimization
3.观察loss function 和accuracy的图
#plot the loss function and train/validation accuraciesplt.subplot(2,1,1)plt.plot(stats['loss_history'])plt.title('loss history')plt.xlabel('Iteration')plt.ylabel('Loss')plt.subplot(2,1,2)plt.plot(stats['train_acc_history'],label='train')plt.plot(stats['val_acc_history'],label='val')plt.title('Claasification accuracy history')plt.xlabel('Epoch')plt.ylabel('Classification accuracy')plt.show()
可以看到loss下降很慢,而且train_acc和var_acc中间几乎没有空隙,说明model has low capacity,我们应该增大样本容量,但同时仍要注意过拟合的问题
4.调整超参数
best_net=None #store the best model into this################################################################################## TODO: Tune hyperparameters using the validation set. Store your best trained ## model in best_net. ## ## To help debug your network, it may help to use visualizations similar to the ## ones we used above; these visualizations will have significant qualitative ## differences from the ones we saw above for the poorly tuned network. ## ## Tweaking hyperparameters by hand can be fun, but you might find it useful to ## write code to sweep through possible combinations of hyperparameters ## automatically like we did on the previous exercises. ##################################################################################learning_rate=[3e-4,1e-2,3e-1]hidden_size=[30,50,100]train_epoch=[30,50]reg=[0.1,3,10]best_val_acc=-1best={}for each_lr in learning_rate: for each_hid in hidden_size: for each_epo in train_epoch: for each_reg in reg: net=TwoLayerNet(input_size,each_hid,num_classes) stats=net.train(X_train,y_train,X_val,y_val,learning_rate=each_lr,reg=each_reg,num_epochs=each_epo) train_acc=stats['train_acc_history'][-1] val_acc=stats['val_acc_history'][-1] if val_acc>best_val_acc: best_val_acc=val_acc best_net=net best['learning_rate']=each_lr best['hidden_size']=each_hid best['epoch']=each_epo best['reg']=each_regfor each_key in best: print('best net:\n '+each_key+':%e'%best[each_key])print('best validation accuracy: %f' %best_val_acc)
0 0
- CS231n——Assignment1--2-layer-network
- cs231n:assignment1——Q4: Two-Layer Neural Network
- [CS231n@Stanford] Assignment1-Q4 (python) Two layer neural network实现
- CS231n——Assignment1-KNN
- CS231n——Assignment1 SVM
- cs231n two-layer network
- cs231n:assignment1——Q2: Training a Support Vector Machine
- cs231n:assignment1——Q3: Implement a Softmax classifier
- cs231n:assignment1——Q5: Higher Level Representations: Image Features
- cs231n assignment1
- CS231n-assignment1
- cs231n——assignment1: Q1: k-Nearest Neighbor classifier(手动复制版)
- cs231n:assignment1——Q1: k-Nearest Neighbor classifier(自动生成版)
- [CS231n@Stanford] Assignment1-Q1
- cs231n assignment1 tips
- cs231n:assignment1:KNN解答
- CS231n Assignment1--Q1
- CS231n Assignment1--Q2
- 自动更新软件
- kingeditor上传文件后的回调函数
- Swagger原理解析
- apt 删除源
- 新浪微博分享图片不改变问题
- CS231n——Assignment1--2-layer-network
- xml和json的拼接
- log4j.properties配置详解与实例
- Eddy的难题
- 学技术不难,重要的是怎么规划,给嵌入式开发的你一个目标
- HDU4719Oh My Holy FFF[线段树优化dp]
- Deep Learning-TensorFlow (12) CNN卷积神经网络_ Network in Network 学习笔记
- 双向链表的C语言实现与基本操作(一)
- 结构体定义 typedef struct 用法详解和用法小结