Online Learning:随机梯度法
来源:互联网 发布:引用60分钟数据 编辑:程序博客网 时间:2024/05/25 12:22
本内容整理自coursera,欢迎交流转载。
1 大数据情景下的问题
随着数据集越来越大,按照之前的梯度下降(上升)算法,每次更新系数
2 每次使用一个数据进行更新
回想原来的梯度上升法,
随机梯度是每次只用一个
3 比较gradient & stochastic gradient
具体可以看看下面这张图:
4 stochastic gradient是如何工作的?
这里类比高中力学的力的分解,我们实际每次是取总梯度的一个分量,由于总梯度是朝着最优的路径前进的,所以大多数情况下,每个分量都使得结果朝着好的方向发展。我们相当于曲线前进!!!
最终我们会在最优点附近震荡,因此我们使用最后一部分的平均值来得到最终的系数矩阵。
5 选择步长η
6 算法改进——每次一个小数据集
每次一个数据:
在线学习可以米隔一段时间更新模型!
7 代码实现
点击这里下载数据文件和代码。
from __future__ import divisionimport graphlabproducts = graphlab.SFrame('amazon_baby_subset.gl/')import jsonwith open('important_words.json', 'r') as f: important_words = json.load(f)important_words = [str(s) for s in important_words]# Remote punctuationdef remove_punctuation(text): import string return text.translate(None, string.punctuation) products['review_clean'] = products['review'].apply(remove_punctuation)# Split out the words into individual columnsfor word in important_words: products[word] = products['review_clean'].apply(lambda s : s.split().count(word))train_data, validation_data = products.random_split(.9, seed=1)import numpy as npdef get_numpy_data(data_sframe, features, label): data_sframe['intercept'] = 1 features = ['intercept'] + features features_sframe = data_sframe[features] feature_matrix = features_sframe.to_numpy() label_sarray = data_sframe[label] label_array = label_sarray.to_numpy() return(feature_matrix, label_array)feature_matrix_train, sentiment_train = get_numpy_data(train_data, important_words, 'sentiment')feature_matrix_valid, sentiment_valid = get_numpy_data(validation_data, important_words, 'sentiment') '''produces probablistic estimate for P(y_i = +1 | x_i, w).estimate ranges between 0 and 1.'''def predict_probability(feature_matrix, coefficients): # Take dot product of feature_matrix and coefficients score = np.dot(feature_matrix, coefficients) # Compute P(y_i = +1 | x_i, w) using the link function predictions = 1. / (1.+np.exp(-score)) return predictionsdef feature_derivative(errors, feature): # Compute the dot product of errors and feature ## YOUR CODE HERE derivative = np.dot(errors,feature) return derivativedef compute_avg_log_likelihood(feature_matrix, sentiment, coefficients): indicator = (sentiment==+1) scores = np.dot(feature_matrix, coefficients) logexp = np.log(1. + np.exp(-scores)) # Simple check to prevent overflow mask = np.isinf(logexp) logexp[mask] = -scores[mask] lp = np.sum((indicator-1)*scores - logexp)/len(feature_matrix) return lpfrom math import sqrtdef logistic_regression_SG(feature_matrix, sentiment, initial_coefficients, step_size, batch_size, max_iter): log_likelihood_all = [] # make sure it's a numpy array coefficients = np.array(initial_coefficients) # set seed=1 to produce consistent results np.random.seed(seed=1) # Shuffle the data before starting permutation = np.random.permutation(len(feature_matrix)) feature_matrix = feature_matrix[permutation,:] sentiment = sentiment[permutation] i = 0 # index of current batch # Do a linear scan over data for itr in xrange(max_iter): # Predict P(y_i = +1|x_i,w) using your predict_probability() function # Make sure to slice the i-th row of feature_matrix with [i:i+batch_size,:] ### YOUR CODE HERE predictions = predict_probability(feature_matrix[i:i+batch_size,:],coefficients) # Compute indicator value for (y_i = +1) # Make sure to slice the i-th entry with [i:i+batch_size] ### YOUR CODE HERE indicator = (sentiment[i:i+batch_size]==+1) # Compute the errors as indicator - predictions errors = indicator - predictions for j in xrange(len(coefficients)): # loop over each coefficient # Recall that feature_matrix[:,j] is the feature column associated with coefficients[j] # Compute the derivative for coefficients[j] and save it to derivative. # Make sure to slice the i-th row of feature_matrix with [i:i+batch_size,j] ### YOUR CODE HERE derivative = feature_derivative(errors, feature_matrix[i:i+batch_size,j]) #print '&&&&'+str(derivative) # compute the product of the step size, the derivative, and the **normalization constant** (1./batch_size) ### YOUR CODE HERE coefficients[j] += step_size*derivative*(1./batch_size) # Checking whether log likelihood is increasing # Print the log likelihood over the *current batch* lp = compute_avg_log_likelihood(feature_matrix[i:i+batch_size,:], sentiment[i:i+batch_size], coefficients) log_likelihood_all.append(lp) if itr <= 15 or (itr <= 1000 and itr % 100 == 0) or (itr <= 10000 and itr % 1000 == 0) \ or itr % 10000 == 0 or itr == max_iter-1: data_size = len(feature_matrix) print 'Iteration %*d: Average log likelihood (of data points in batch [%0*d:%0*d]) = %.8f' % \ (int(np.ceil(np.log10(max_iter))), itr, \ int(np.ceil(np.log10(data_size))), i, \ int(np.ceil(np.log10(data_size))), i+batch_size, lp) # if we made a complete pass over data, shuffle and restart i += batch_size if i+batch_size > len(feature_matrix): permutation = np.random.permutation(len(feature_matrix)) feature_matrix = feature_matrix[permutation,:] sentiment = sentiment[permutation] i = 0 # We return the list of log likelihoods for plotting purposes. return coefficients, log_likelihood_all
1 0
- Online Learning:随机梯度法
- online learning,batch learning&批量梯度下降,随机梯度下降
- online learning,batch learning&批量梯度下降,随机梯度下降
- 随机梯度下降法
- 随机梯度下降法
- 随机梯度下降法
- 随机梯度下降法
- 随机梯度下降法
- 随机梯度下降法
- 随机梯度下降法
- 随机梯度下降法
- 随机梯度下降法
- 深度学习之(十一)Deep learning中的优化方法:随机梯度下降、受限的BFGS、共轭梯度法
- Deep learning系列(十)随机梯度下降
- 梯度、梯度下降法、随机梯度下降法
- 梯度、梯度下降法、随机梯度下降法
- 随机梯度下降法2
- online learning
- ubuntu 安装五笔输入法
- Release模式下调试的方法
- webService示例
- 标准单元库的理解
- Fragment + RadioGroup实现底部导航
- Online Learning:随机梯度法
- 网页离线缓存
- Unity实战 RTS3D即时战略游戏开发(六) Navigation Mesh 自动寻路
- Android自定义控件实战——水流波动效果的实现WaveView
- STM32启动过程与向量表
- 查看本机ip 网关 dns
- itoa、atoi strchr
- 【学习心得】java实用编程100例
- jsp动态显示系统时间页面