机器学习笔记之学习速率

来源:互联网 发布:linux cat vi 编辑:程序博客网 时间:2024/06/05 03:40

GradientDescent in Practice II - Learning Rate

Note: [5:20 - the x -axis label in theright graph should be θ rather than No. of iterations ]

Debugginggradient descent. Make a plot with number of iterations onthe x-axis. Now plot the cost function, J(θ) over the number of iterations ofgradient descent. If J(θ) ever increases, then you probably need to decrease α.

Automatic convergence test. Declareconvergence if J(θ) decreases by less than E in one iteration, where E is somesmall value such as 10−3. However in practice it's difficult to choose thisthreshold value.


It has been proven that if learning rate α is sufficiently small, then J(θ) will decreaseon every iteration.


To summarize:

If α is toosmall: slow convergence.

If α is toolarge: may not decrease on every iteration and thus may not converge.