cs231n assignment(1.4):two_layer_net

来源：互联网发布：杭州擎洲软件编辑：程序博客网时间：2024/06/06 18:13

two_layer_net

这个练习中，使用一个两层全连接的神经网络用以分类工作
各层的情况是：
input - fully connected layer - ReLU - fully connected layer - softmax

两层的神经网络计算loss和梯度

def ReLU(x):        #ReLU non-linearity.       return np.maximum(0, x)

  def loss(self, X, y=None, reg=0.0):    # Unpack variables from the params dictionary    W1, b1 = self.params['W1'], self.params['b1']    W2, b2 = self.params['W2'], self.params['b2']    N, D = X.shape    # Compute the forward pass    scores = None    s1 = X.dot(W1)+b1    h1 = ReLU(s1)    scores = h1.dot(W2)+b2        # If the targets are not given then jump out, we're done    if y is None:      return scores    # Compute the loss    loss = None    f_max = np.max(scores,axis=1,keepdims=True)    f_scores = scores - f_max    prob = np.exp(f_scores)/np.sum(np.exp(f_scores),axis=1,keepdims=True)    loss = np.sum(-np.log(prob[np.arange(N),y]))    loss = loss/N + 0.5*reg*np.sum(W1*W1) + 0.5*reg*np.sum(W2*W2)    # Backward pass: compute gradients    grads = {}    dscores = prob    dscores[np.arange(N),y] -= 1    dscores /= N    #Second Layer    dh1 = dscores.dot(W2.T)    dW2 = h1.T.dot(dscores)    db2 = np.sum(dscores,axis=0)    #ReLU    dh1[s1<=0]=0    ds1 = dh1    #First Layer    dW1 = np.dot(X.T,ds1)    db1 = np.sum(ds1,axis=0)    #Reg    dW1 += reg*W1    dW2 += reg*W2    #Store    grads['W1'] = dW1    grads['W2'] = dW2    grads['b1'] = db1    grads['b2'] = db2                 return loss, grads

关于调试超参数：
loss的下降类似于线性，意味着学习率可能太低。训练和验证集精确度之间没有gap,说明模型capacity低，所以我们应该增大模型的size.因为更大的模型会更overfitting。

阅读全文

0 0