神经网络简单实现和公式推导

来源：互联网发布：plc与单片机的3000介绍编辑：程序博客网时间：2024/06/01 09:54

本文将会一步步实现简单的三层神经网络，并推到一部分数学公式，帮助读者理解神经网络并快速入门。

导入包.

import matplotlib.pyplot as pltimport numpy as npimport sklearn.datasetsimport sklearn.linear_model

数据集

为了简单起见，我们直接使用sklearn提供的make_moons或者使用make_circles来生成简单的数据集

代码如下：

np.random.seed(0)X,y=sklearn.datasets.make_moons(400,noise=0.2)plt.scatter(X[:,0],X[:,1],s=40,c=y,cmap=plt.cm.Spectral)plt.scatter(X[:,0],X[:,1],s=40,c=y, cmap=plt.cm.Spectral)plt.show()

我们发现，上图是线性不可分的。我们可以使用非线性的多项式来进行预测，但假设我们有非常多的特征，例如大于100个变量，我们希望用这100个特征来构建一个非线性的多项式模型，结果将是数量非常惊人的特征组合，即便我们只采用两两特征的组合 (x1x2+x1x3+x1x4+...+x2x3+x2x4+...+x99x100),我们也会有接近5000个组合而成的特征。这对于一般的逻辑回归来说需要计算的特征太多了。这时候就需要使用神经网络。

神经网络

神经网络模型

x1,x2,x3为输入单元，我们把原始数据输入给它。a1,a2,a3是中间单元，负责数据的处理，并提交给下一层。最后为输出层，负责计算hθ(x)。我们为每一层都增加一个偏差单位（bias unit）。

本案例中输入层为2，隐藏层节点数为3，输出层为2。

为了理解，我们引入一些标记法来描述模型：

那么上图就可以用表达式来表达：

我们知道每一个a都是由上一层的所有x决定的。把x和a看作矩阵的话,那么

所以

我们还需要为隐藏层挑选一个g()，激活函数（activity function），为的是将神经元的特征通过函数保留并映射下去，这是神经网络解决非线性问题的关键。

激活函数有：

tanh　　　双切正切函数，取值范围[-1,1]

sigmoid　采用S形函数，取值范围[0,1]

ReLU x*(x>0)

至此，本案例可表达为：

因为我们想要得到神经网络输出概率，所以输出层的激活函数就要是softmax。这是一种将原始分数转换为概率的方法。

上文我们为了计算出代价函数，采用了由左向右的方法，即前向传播方法。

那么我们给出神经网络的代价函数：

为了求出代价函数的偏导数，我们需要采用一种反向传播算法，即从右往左的顺序，逐个计算误差。

由此我们可以推导出本案例中：

我们定义一些基本参数：

num_examples = len(X) # training set sizenn_input_dim = 2 # input layer dimensionalitynn_output_dim = 2 # output layer dimensionality # Gradient descent parameters (I picked these by hand)epsilon = 0.01 # learning rate for gradient descentreg_lambda = 0.01 # regularization strength

接下来画分类器边界：

def plot_decision_boundary(pred_func):# Set min and max values and give it some padding    x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5    y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5    h = 0.01# Generate a grid of points with distance h between them    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))    Z = pred_func(np.c_[xx.ravel(), yy.ravel()])    Z = Z.reshape(xx.shape)# Plot the contour and training examples    plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral)

接下来定义代价函数：

# Helper function to evaluate the total loss on the datasetdef calculate_loss(model):    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']    # Forward propagation to calculate our predictions    z1 = X.dot(W1) + b1    a1 = np.tanh(z1)    z2 = a1.dot(W2) + b2    exp_scores = np.exp(z2)    probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)    # Calculating the loss    corect_logprobs = -np.log(probs[range(num_examples), y])    data_loss = np.sum(corect_logprobs)    # Add regulatization term to loss (optional)    data_loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)))    return 1./num_examples * data_loss

还要实现一个辅助函数来计算网络的输出。它的工作就是传递前面定义的前向传播并返回概率最高的类别。

# Helper function to predict an output (0 or 1)def predict(model,x):    W1,b1,W2,b2=model['W1'],model['b1'],model['W2'],model['b2']    # Forward propagation    z1=x.dot(W1)+b1    a1=np.tanh(z1)    z2=a1.dot(W2)+b2    exp_scores=np.exp(z2)    probs=exp_scores/np.sum(exp_scores,axis=1,keepdims=True)    return np.argmax(probs,axis=1)

最后是训练神经网络的函数

# - nn_hdim: Number of nodes in the hidden layer# - num_passes: Number of passes through the training data for gradient descent# - print_loss: If True, print the loss every 1000 iterationsdef build_model(nn_hdim, num_passes=20000, print_loss=False):     # Initialize the parameters to random values. We need to learn these.    np.random.seed(0)    W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim)    b1 = np.zeros((1, nn_hdim))    W2 = np.random.randn(nn_hdim, nn_output_dim) / np.sqrt(nn_hdim)    b2 = np.zeros((1, nn_output_dim))     # This is what we return at the end    model = {}     # Gradient descent. For each batch...    for i in range(0, num_passes):         # Forward propagation        z1 = X.dot(W1) + b1        a1 = np.tanh(z1)        z2 = a1.dot(W2) + b2        exp_scores = np.exp(z2)        probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)         # Backpropagation        delta3 = probs        delta3[range(num_examples), y] -= 1        dW2 = (a1.T).dot(delta3)        db2 = np.sum(delta3, axis=0, keepdims=True)        delta2 = delta3.dot(W2.T) * (1 - np.power(a1, 2))        dW1 = np.dot(X.T, delta2)        db1 = np.sum(delta2, axis=0)         # Add regularization terms (b1 and b2 don't have regularization terms)        dW2 += reg_lambda * W2        dW1 += reg_lambda * W1         # Gradient descent parameter update        W1 += -epsilon * dW1        b1 += -epsilon * db1        W2 += -epsilon * dW2        b2 += -epsilon * db2         # Assign new parameters to the model        model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}         # Optionally print the loss.        # This is expensive because it uses the whole dataset, so we don't want to do it too often.        if print_loss and i % 1000 == 0:          print "Loss after iteration %i: %f" %(i, calculate_loss(model))     return model

# Build a model with a 3-dimensional hidden layermodel = build_model(3, print_loss=True) # Plot the decision boundaryplot_decision_boundary(lambda x: predict(model, x))plt.title("Decision Boundary for hidden layer size 3")

plt.show()

效果如下：

隐藏层节点数选择

隐藏层的节点数决定这此模型的好坏，节点数越多，训练函数就越复杂，过多时可能导致过拟合。那么节点数由几个经验公式，具体选择哪个可以多试一试。

阅读全文

0 0