[Coursera][Stanford] Machine Learning Week 5

来源：互联网发布：梅西过人知乎编辑：程序博客网时间：2024/05/17 02:20

时间：8月20日---25日

本周介绍了神经网络（Neural Networks）的学习，包括Cost Function, Backpropagation Algorithm(反向传播算法）来最小化J、Forward propagation 、 Gradient checking、 Random Initialzation。

Programming Exercise 4:

Neural Networks Learning

1 Neural Networks

1.3 Feedforward and cost function

在课程论坛TA关于Forward Propagation 的提示：

perform the forward propagation:
a1 equals the X input matrix with a column of 1's added (bias units)
z2 equals the product of a1 and Θ1
a2 is the result of passing z2 through g()
a2 then has a column of 1st added (bias units)
z3 equals the product of a2 and Θ2
a3 is the result of passing z3 through g()

对于求J(Θ)需先求h,根据提议要求3层神经网络Θ1为25*401，a1为5000*401，得a2为5000*26，Θ2为10*26,求得h最终为5000*10

Q: 为什么最终求的h要是5000*10呢？10*5000不可以吗。。。（得到错误答案304.799133 ）

根据题意y为5000*1，在本习题的Neural Network中y（输出为10位）的表示不同于十进制，比如5表示为0000100000.因此需要把y由5000*1的向量转换为5000*10的矩阵，然后与h点乘。即：

Update: Remember to use element-wise multiplication with the log() function.

即可得正确结果。（此处不需要regularized）

a = sigmoid([ones(m,1) X] * Theta1');h = sigmoid([ones(size(a,1), 1)  a] * Theta2');y_matrix = zeros(size(h));for i = 1:size(y_matrix,1)    for j = 1:size(y_matrix,2)        if j == y(i)            y_matrix(i,j) = 1;        end    endendJ = - (1 / m) * sum(sum(y_matrix .* log(h) + (1 - y_matrix) .* log(1 - h)));

</pre><pre name="code" class="plain">% 2nd% tmp_eye=eye(num_labels);% y_matrix=tmp_eye(y,:);

1.4 Regularized cost function

此处我无比无语的犯了一个无比无语的错误。。。我把Theta1写了两遍还死活找不出错误白白耽误了好久好久。。。。。。

J = J + (lambda / (2 * m)) * (sum(sum(Theta1(:,2:end) .^ 2)) + sum(sum(Theta2(:,2:end) .^ 2)));

2 Backpropagation

2.1 Sigmoid gradient

也就是求一下导数。g‘(z) = g(z) = g(z)(1 − g(z))

g = sigmoid(z) .* (1 - sigmoid(z));

2.3 Backpropagation

Now we work from the output layer back to the hidden layer, calculating how bad the errors are.

注意循环m次，a1为向量X(i;1) ，yk为向量，所求Delta1为25*401，Delta2为10*26（？），注意在此处for循环中为各种向量。。。我已经要被向量或矩阵搞疯了

Delta1 = zeros(hidden_layer_size, input_layer_size+1);Delta2 = zeros(num_labels, hidden_layer_size+1);for i = 1:m    %Compute activations    a1 = X(i,:)';    a1 = [1;a1];    a2 = sigmoid(Theta1 * a1);    a2 = [1;a2];    a3 = sigmoid(Theta2 * a2);    % Compute delta (output layer)    yk = zeros(num_labels,1);yk( y(i) ) = 1;    d3 = a3 - yk;    % Compute delta (hidden layer)     d2 = (Theta2' * d3) .* sigmoidGradient([1;Theta1 * a1]);    % Accumulate the gradient    d2 = d2(2:end);    Delta2 = Delta2 + d3 * a2';    Delta1 = Delta1 + d2 * a1';endTheta1_grad = (1 / m) * Delta1;Theta2_grad = (1 / m) * Delta2;

2.5 Regularized Neural Networks

当j = 0 时不需要regularized，也就是theta的第一列不需要。

Theta1_grad = (1 / m) * Delta1 + (lambda / m) * [zeros(size(Theta1,1),1) Theta1(:,2:end)];Theta2_grad = (1 / m) * Delta2 + (lambda / m) * [zeros(size(Theta2,1),1) Theta2(:,2:end)];

0 0