斯坦福机器学习第四周（神经网络及其应用）

来源：互联网发布：男士大衣品牌知乎编辑：程序博客网时间：2024/05/16 04:38

1.为什么要引入神经网络（Neural Network）

一句话总结就是当特征值n特别大时，比如当n为100时；仅仅是其2次项特征值(x21,x1x2,x1x3…x1x100;x22,x2x3…x2x100;…)就有大约5000个（从100累加到1）。而在实际问题中n的值往往有上百万，上亿。所以这样就非常容易导致过度拟合，以及计算量大的问题。因此，便引入了神经网络(neural network)。

2.神经网络模型（Neural Network Model）

Let’s examine how we will represent a hypothesis function using neural networks. At a very simple level, neurons are basically computational units that take inputs (dendrites) as electrical inputs (called “spikes”) that are channeled to outputs (axons). In our model, our dendrites are like the input features x1…xn, and the output is the result of our hypothesis function. In this model our x0 input node is sometimes called the “bias unit.” It is always equal to 1. In neural networks, we use the same logistic function as in classification, 11+e−θTx, yet we sometimes call it a sigmoid (logistic) activation function. In this situation, our “theta” parameters are sometimes called “weights“.

如图就是一个只包含一个神经元的模型，黄色圆圈为神经元细胞(cell body)，
这里写图片描述

而真正的神经网络是若干个这样不同的神经元组合而成的，如下图

这里写图片描述

其中x0=1，称为 bias unit，a(2)0称为mixture bias unit，也为1。通常我们不需要表示出来，知道其存在就好。另外，我们称Layer1为输入层(input layer)，Layer2为输出层(output layer)，中间的所有（这儿仅Layer2）层都称为隐藏层(hidden layer)。并且在这个例子中，我们称a20,a21,a22,a23为活化单元（activation unit）。

3.神经网络的数学定义（Mathematical definition）

这里写图片描述

Θ(j)是一个矩阵，表示第j层所对应的权重(weights)；也就是说每一层都有这样一个矩阵，通过该层的活化单元a(j)（输入层也可视为活化单元）与该层对应的权重Θ(j)进行线性运算来得到下一层的活化单元a(j+1)。

比如一个4×3的权重矩阵：

Θ (i) = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ Θ (i) 10 Θ (i) 20 Θ (i) 30 Θ (i) 40 Θ (i) 11 Θ (i) 21 Θ (i) 31 Θ (i) 41 Θ (i) 12 Θ (i) 22 Θ (i) 32 Θ (i) 42 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

The values for each of the “activation” nodes is obtained as follows:

a (2) 1 = g (Θ (1) 10 x 0 + Θ (1) 11 x 1 + Θ (1) 12 x 2 + Θ (1) 13 x 3) a (2) 2 = g (Θ (1) 20 x 0 + Θ (1) 21 x 1 + Θ (1) 22 x 2 + Θ (1) 23 x 3) a (2) 3 = g (Θ (1) 30 x 0 + Θ (1) 31 x 1 + Θ (1) 32 x 2 + Θ (1) 33 x 3) h Θ (x) = a (3) 1 = g (Θ (2) 10 a (2) 0 + Θ (2) 11 a (2) 1 + Θ (2) 12 a (2) 2 + Θ (2) 13 a (2) 3)

This is saying that we compute our activation nodes by using a 3×4 matrix of parameters. We apply each row of the parameters to our inputs to obtain the value for one activation node. Our hypothesis output is the logistic function applied to the sum of the values of our activation nodes, which have been multiplied by yet another parameter matrix Θ(2) containing the weights for our second layer of nodes.

因此我们可以看出，其实每一个活化单元的值都是以上一层作为输入，以上一层的权重矩阵的对应一行为参数，然后进行关于函数g(z)的映射，而g(z)恰恰是之前所学的逻辑回归假设函数的表达形式。所以，神经网络可以认为是若干逻辑回归模型所组成的（此观点为博主个人的主观猜想）。

Each layer gets its own matrix of weights, Θ(j). The dimensions of these matrices of weights is determined as follows:

If network has sj units in layer j and sj+1 units in layer j+1, then Θ(j) will be of dimension sj+1×(sj+1).