neural networks学习笔记（一）

来源：互联网发布：网络诈骗青少年案例编辑：程序博客网时间：2024/06/01 09:57

improving the way neural networks learn

the cross-entropy cost function

1.比较初始值不同的两个学习曲线，验证结论：当我们错得越厉害，学习得越快

2.学习速度越来越慢的原因：
sigmoid函数的导数在收敛到1或0的过程中越来越小趋近于0

这里写图片描述

如何解决学习越来越慢？

方法之一，用叉熵成本函数代替平方成本。

这里写图片描述

1.如何理解它作为一个cost function?
(1)非负，括号内所有加数为负
(2)当y和a接近时，C接近于0

2.它如何做到解决学习变慢的？
求偏导后sigmoid函数的导数被消去了

这里写图片描述

在离正确结果比较远的时候，选择叉熵作为cost function的学习速度比平方成本快，that’s the point.
在输出神经元是sigmoid neurons，cross-entropy几乎总是比quadratic cost更好的选择。

注意：
1.C中ln内是a,不是y,别搞反了
2.当时回归问题是，目标值不再是0或1，可能是0到1中间的任何值，C变为：
这里写图片描述
称为binary entropy（就特么是信息熵）

classify MINIST digits
强调论据的严谨性，要证明叉熵作为成本函数更好必须把其他超参数都调至最佳（控制变量）。实际中为节约时间不这么做。

叉熵是什么？叉熵怎么来的？
1.为了消去成本函数偏导包含的simoid的导数，倒回去求得。
这里写图片描述
2.叉熵表示一种“惊喜”程度，信息论知识

softmax

另一种解决越学越慢的方法，softmax layers of neutrons

需要前一节内容的知识，后面补

overfitting and regularization

1.参数过多的模型可能仅仅只还原了数据表象，没有把握本质规律

2.观测在训练集和测试集上的准确率，把握拟合程度

3.验证集上的准确率饱和了便停止训练，称为early stopping

4.To put it another way, you can think of the validation data as a type of training data that helps us learn good hyper-parameters.—-hold out method

解决过拟合？

1.老话，one of the best ways of reducing overfitting is to increase the size of the training data
2.降低模型复杂度
3.正则化
这里写图片描述
when λ is small we prefer to minimize the original cost function, but when λ is large we prefer small weights.
4.dropout
重要，训练时随机去掉一些中间神经元，生成多个神经网络，投票表决，常与正则化一起使用
5.artificially expanding the training data
例：旋转MNIST的数字一个微小角度作为新样本；语音识别中添加噪声

减小过拟合也是当前的一个研究问题

weight initialization

没求看懂，以后补

今天英文看起来真特么烦

0 0