ML实战-Adaline with stochastic gradient descent
来源:互联网 发布:mysql 级联删除 编辑:程序博客网 时间:2024/06/05 11:20
原理
stochastic gredient descent
初版的Adaline的最大缺点是需要x, y 的全集来进行计算weight, 但是在实际的大数据应用场景中,这是不可能的。因为在网络中,数据是指数增长的,有新的数据源源不断地添加。所以需要引入“批处理的梯度下降算法”这个概念。
以下是前一章初级gradient descent过程:将全部的x放入神经网中训练.
for i in range(self.n_iter): output = self.net_input(X) errors = (y - output) self.w_[1:] += self.eta * X.T.dot(errors) self.w_[0] += self.eta * errors.sum() cost = (errors**2).sum() / 2.0 self.cost_.append(cost)
stochastic gradient descent就是一种特殊的批处理的梯度下降算法,它随机选择sample
来更新weights. 特殊的原因在于它是batch size =1, 也就是一个一个的处理。
批处理满足于实时训练模型。当我们用已有的数据训练好一个模型后,可以一个一个接收新来的数据继续完善我们的模型。(在后文中fit函数为训练已有数据,partial_fit函数为后续数据做训练调用)
adaptive learning rate
在stochastic gradient descent算法中,常用到的是可变换的学习速率,比如按照迭代次数逐步减短:
mini-batch learning
stochastic gradient descent是一个一个数据处理,而mini-batch learning 则是更为广义的批处理,比如batch size = 50.
实现
基于上篇的Adaline训练模型, 在此次模型中添加:
1. shuffle函数。随机选sample.
2. partial_fit函数。用来训练后续数据集。
from numpy.random import seedimport numpy as npclass AdalineSGD(object): """ADAptive LInear NEuron classifier. Parameters ------------ eta : float Learning rate (between 0.0 and 1.0) n_iter : int Passes over the training dataset. Attributes ----------- w_ : 1d-array Weights after fitting. errors_ : list Number of misclassifications in every epoch. shuffle : bool (default: True) Shuffles training data every epoch, ensure choose dataset randomly if True to prevent cycles. random_state : int (default: None) Set random state for shuffling and initializing the weights. """ def __init__(self, eta=0.01, n_iter=10, shuffle=True, random_state=None): self.eta = eta self.n_iter = n_iter self.w_initialized = False self.shuffle = shuffle if random_state: seed(random_state) def fit(self, X, y): """ Fit training data. Parameters ---------- X : {array-like}, shape = [n_samples, n_features] Training vectors, where n_samples is the number of samples and n_features is the number of features. y : array-like, shape = [n_samples] Target values. Returns ------- self : object """ self._initialize_weights(X.shape[1]) self.cost_ = [] for i in range(self.n_iter): if self.shuffle: X, y = self._shuffle(X, y) cost = [] #online processing for xi, target in zip(X, y): cost.append(self._update_weights(xi, target)) avg_cost = sum(cost)/len(y) self.cost_.append(avg_cost) return self def partial_fit(self, X, y): """ Fit training data without reinitializing the weights If we want to update our model—for example, in an on-line learning scenario with streaming data—we could simply call the partial_fit method on individual samples—for instance, ada.partial_fit(X_std[0, :], y[0]). """ if not self.w_initialized: self._initialize_weights(X.shape[1]) #ravel()多维数组降到一维数组,按行读取。 if y.ravel().shape[0] > 1: for xi, target in zip(X, y): self._update_weights(xi, target) else: self._update_weights(X, y) return self def _shuffle(self, X, y): """Shuffle training data""" r = np.random.permutation(len(y)) return X[r], y[r] def _initialize_weights(self, m): """Initialize weights to zeros""" self.w_ = np.zeros(1 + m) self.w_initialized = True def _update_weights(self, xi, target): """Apply Adaline learning rule to update the weights""" output = self.net_input(xi) error = (target - output) self.w_[1:] += self.eta * xi.dot(error) self.w_[0] += self.eta * error cost = 0.5 * error**2 return cost def net_input(self, X): """Calculate net input""" return np.dot(X, self.w_[1:]) + self.w_[0] def activation(self, X): """Compute linear activation""" return self.net_input(X) def predict(self, X): """Return class label after unit step""" return np.where(self.activation(X) >= 0.0, 1, -1)
测试
>>> ada = AdalineSGD(n_iter=15, eta=0.01, random_state=1)>>> ada.fit(X_std, y)>>> plt.plot(range(1, len(ada.cost_) + 1), ada.cost_, marker='o')>>> plt.xlabel('Epochs')>>> plt.ylabel('Average Cost')>>> plt.show()
如果有新的数据集增加:
ada.partial_fit(X_std[0, :], y[0])
阅读全文
0 0
- ML实战-Adaline with stochastic gradient descent
- 【转载】Stochastic Gradient Descent
- Stochastic Gradient Descent (SGD)
- Optimization:Stochastic Gradient Descent
- Optimization: Stochastic Gradient Descent
- method_SGD(Stochastic Gradient Descent)
- Stochastic Gradient Descent
- Batch Gradient Descent and Stochastic Gradient Descent
- Stochastic gradient descent与Batch gradient descent
- CS231n Optimization: Stochastic Gradient Descent
- CS231Optimization: Stochastic Gradient Descent笔记
- batch gradient descent和stochastic/incremental gradient descent
- gradient descent vs (mini-batch) stochastic gradient descent
- scikit-learn 1.5. Stochastic Gradient Descent
- SGD(Stochastic Gradient Descent)随机梯度下降
- BGD(Batch Gradient Descent), SGD (Stochastic Gradient Descent), MBGD (Mini-Batch Gradient Descent)
- Stochastic Gradient Descent vs Batch Gradient Descent vs Mini-batch Gradient Descent
- Stochastic Gradient Descent收敛判断及收敛速度的控制
- maven多模块使用,父模块(modules使用,package替pom),子模块(parent使用)
- 调研tcp定时器
- node中上传文件【base64文件流+插件调用】两种方式
- 数据结构(笔记)
- [PAT-甲级]1009.Product of Polynomials
- ML实战-Adaline with stochastic gradient descent
- 第一章 编程技巧
- oracle数据库中的数据信息
- 单列设计模式
- IPC机制--开发艺术探索(一)
- 证明推导过程记录
- Linux下C++ 性能测试工具GPROF()入门教程
- 基于nginx-rtmp-module模块实现的基于HTTP协议的FLV直播模块(nginx-http-flv-module)
- 9.显示磁盘下文件和目录信息