Large Scale Machine Learning

来源:互联网 发布:东离剑游纪 知乎 编辑:程序博客网 时间:2024/04/27 21:11

这里写图片描述

Before learning with large datasets, plot a learning curve with a much small datasets, and to see if it appears to be have high bias.


Stochastic Gradient Decent
(taking linear regression for example)

这里写图片描述
这里写图片描述

(Repeat between 1 to 10 times)

as the picture shows, it will converge in random directions and fail to converge into a single point, but much more effective and faster, compared with Batch Gradient Descent.


Mini-Batch Gradient Descent

这里写图片描述
这里写图片描述

b is often chosen from 2 to 100

Mini-Batch will be faster than Stochastic only by vetorization.


Checking for Converge
(without scanning over the entire training set periodically)
这里写图片描述
In plot 1, with a smaller α, the red line converge better, however, slower.
In plot 2 and 3, with a larger examples, the red line looks smoother, which do help to plot 3 confronted too much noise. Similarly, it will be slower.
In plot 4, with a increasing plot, maybe we should using a smaller α instead.


Learning Rate

to get closer to the optima:

这里写图片描述

However, it will become a hard work to choose these two constants, so it maybe not good choice.


the online learning
—to learn from a continuous stream of data

这里写图片描述

Without the superscript i, while a set of data has been used, we discard it.
And it also can adapt to change preference.

这里写图片描述
CTR(Click Through Rate)

With 10 pairs of (x,y), they are also 10 examples, which can improve our parameters.


Map Reduce

这里写图片描述
这里写图片描述
这里写图片描述
这里写图片描述

0 0
原创粉丝点击