Machine Learning in Gradient Descent
来源:互联网 发布:php 字符串长度 编辑:程序博客网 时间:2024/06/07 05:54
搬运一篇
http://horicky.blogspot.com/2012/10/machine-learning-in-gradient-descent.html
Thursday, October 4, 2012
Machine Learning in Gradient Descent
In Machine Learning, gradient descent is a very popular learning mechanism that is based on a greedy, hill-climbing approach.
Notice that we intentionally leave the following items vaguely defined so this approach can be applicable in a wide range of machine learning scenarios.
In batch learning, all training will be fed to the model, who estimates the output for all data points. Error will then be summed to compute the loss and then update the model. Model in this case will be updated after predicting the whole batch of data points.
In online learning mode (also called stochastic gradient descent), data is fed to the model one at a time while the adjustment of the model is immediately made after evaluating the error of this single data point. Notice that the final result of incremental learning can be different from batch learning, but it can be proved that the difference is bound and inversely proportional to the square root of the number of data points.
The learning rate can be adjusted as well to achieve a better stability in convergence. In general, the learning rate is higher initially and decrease over the iteration of training (in batch learning it decreases in next round, in online learning it decreases at every data point). This is quite intuitive as you paid less attention to the error as you have learn more and more. Because of that online learning is sensitive to the arrival order of data.
One way to adjust the learning rate is to have a constant divide by the square root of N (where N is the number of data point seen so far).
ɳ = ɳ_initial / (t ^ 0.5).
By using different decay factor, we can control how much attention we should pay for the late coming data. In online learning, as data comes in time of occurrence, we can play around with this decay factor to guide how much attention the learning mechanism should be paying to latest arrival data. Online learning automatically adapt to change of trends over time.
Most real world machine learning scenario relies on stationality of the model. By the way, learning is about "learning from past experience". If the environment changes too rapidly that the past experience is invalid, there is little value to learn. Because of this reason, most machine learning project are satisfied by using batch learning (daily or weekly) and the demand of online learning is not very high. A very common batch learning model is described in my previous blog here.
Because of no dependency in data processing, Gradient Descent is very easy to put into a parallel processing environment such as Map/Reduce. Here we illustrate how to parallelize the execution of batch learning.
Notice that there are multiple rounds of Map/Reduce until the model converges. On the other hand, online learning is not possible for Hadoop Map/Reduce which doesn't support real-time at this moment.
In summary, gradient descent is a very powerful approach of machine learning and works well in a wide spectrum of scenarios.
Gradient Descent
The basic idea of Gradient Descent is to use a feedback loop to adjust the model based on the error it observes (between its predicted output and the actual output). The adjustment (notice that there are multiple model parameters and therefore should be considered as a vector) is pointing to a direction where the error is decreasing in the steepest sense (hence the term "gradient").Notice that we intentionally leave the following items vaguely defined so this approach can be applicable in a wide range of machine learning scenarios.
- The Model
- The loss function
- The learning rate
- Intuitive and easy to understand
- Easy to run in parallel processing architecture
- Easy to run incrementally with additional data
Batch vs Online Learning
While some other Machine Learning model (e.g. decision tree) requires a batch of data points before the learning can start, Gradient Descent is able to learn each data point independently and hence can support both batch learning and online learning easily. The difference lies in how the training data is fed into the model and how the loss function computes its error.In batch learning, all training will be fed to the model, who estimates the output for all data points. Error will then be summed to compute the loss and then update the model. Model in this case will be updated after predicting the whole batch of data points.
In online learning mode (also called stochastic gradient descent), data is fed to the model one at a time while the adjustment of the model is immediately made after evaluating the error of this single data point. Notice that the final result of incremental learning can be different from batch learning, but it can be proved that the difference is bound and inversely proportional to the square root of the number of data points.
The learning rate can be adjusted as well to achieve a better stability in convergence. In general, the learning rate is higher initially and decrease over the iteration of training (in batch learning it decreases in next round, in online learning it decreases at every data point). This is quite intuitive as you paid less attention to the error as you have learn more and more. Because of that online learning is sensitive to the arrival order of data.
One way to adjust the learning rate is to have a constant divide by the square root of N (where N is the number of data point seen so far).
ɳ = ɳ_initial / (t ^ 0.5).
By using different decay factor, we can control how much attention we should pay for the late coming data. In online learning, as data comes in time of occurrence, we can play around with this decay factor to guide how much attention the learning mechanism should be paying to latest arrival data. Online learning automatically adapt to change of trends over time.
Most real world machine learning scenario relies on stationality of the model. By the way, learning is about "learning from past experience". If the environment changes too rapidly that the past experience is invalid, there is little value to learn. Because of this reason, most machine learning project are satisfied by using batch learning (daily or weekly) and the demand of online learning is not very high. A very common batch learning model is described in my previous blog here.
Parallel Learning
Because of no dependency in data processing, Gradient Descent is very easy to put into a parallel processing environment such as Map/Reduce. Here we illustrate how to parallelize the execution of batch learning.
Notice that there are multiple rounds of Map/Reduce until the model converges. On the other hand, online learning is not possible for Hadoop Map/Reduce which doesn't support real-time at this moment.
In summary, gradient descent is a very powerful approach of machine learning and works well in a wide spectrum of scenarios.
- Machine Learning in Gradient Descent
- Machine Learning - Gradient Descent in Practice
- courses-machine learning-gradient descent
- 【Machine Learning】梯度下降 Gradient Descent
- Cousera Machine Learning 笔记:Gradient Descent
- Machine Learning - Gradient Descent (梯度下降)
- [Machine Learning] [Octave]Gradient Descent Practice
- #“Machine Learning”(Andrew Ng)#Week 1_2:Gradient Descent
- 第三讲 Gradient descent in practice II: Learning rate
- 【Machine Learning实验1】batch gradient descent(批量梯度下降) 和 stochastic gradient descent(随机梯度下降)
- 【Machine Learning实验1】batch gradient descent(批量梯度下降) 和 stochastic gradient descent(随机梯度下降)
- 【Machine Learning实验1】batch gradient descent(批量梯度下降) 和 stochastic gradient descent(随机梯度下降)
- 【Machine Learning实验1】batch gradient descent(批量梯度下降) 和 stochastic gradient descent(随机梯度下降)
- [Machine Learning]随机梯度下降(Stochastic gradient descent)和 批量梯度下降(Batch gradient descent )的对比
- Learning to learn by gradient descent by gradient descent
- Learning to learn by gradient descent by gradient descent 笔记
- 梯度下降实用技巧II之学习率 Gradient descent in practice II -- learning rate
- 4 - 4 - Gradient Descent in Practice II - Learning Rate (9 min)
- 嵌入式实时操作系统ucos ii的分析
- Linux RPS RFS(todo)
- win7 桌面图标拖不动解决方法
- java提高篇(十)-----强制类型转换
- python装饰器(概括)
- Machine Learning in Gradient Descent
- OCP-1Z0-053-V12.02-93题
- UVa 402 M*A*S*H (STL&list)
- python闭包详解
- linux下文件同步神器——rsync
- NET代码保护解决方案
- 常用的JS正则匹配代码(转)
- SVM-支持向量机算法导论
- java 二分查找算法实现