分类和回归(四)-线性回归
来源:互联网 发布:linux 线程等待 编辑:程序博客网 时间:2024/06/06 02:29
回归问题的条件或者说前提是
- 1) 收集的数据
- 2) 假设的模型,即一个函数,这个函数里含有未知的参数,通过学习,可以估计出参数。然后利用这个模型去预测/分类新的数据。
importorg.apache.spark.ml.regression.LinearRegression// 加载数据valtraining= spark.read.format("libsvm") .load("data/mllib/sample_linear_regression_data.txt")vallr=newLinearRegression() .setMaxIter(10) .setRegParam(0.3) .setElasticNetParam(0.8)// 训练模型vallrModel= lr.fit(training)// 打印线性回归的系数和截距println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")// 打印统计信息valtrainingSummary= lrModel.summaryprintln(s"numIterations: ${trainingSummary.totalIterations}")println(s"objectiveHistory: [${trainingSummary.objectiveHistory.mkString(",")}]")trainingSummary.residuals.show()println(s"RMSE: ${trainingSummary.rootMeanSquaredError}")println(s"r2: ${trainingSummary.r2}")
// 正则化参数,默认为0,对应于优化算法中的lambdadefsetRegParam(value: Double):this.type = set(regParam, value)setDefault(regParam ->0.0)// 是否使用截距,默认使用defsetFitIntercept(value: Boolean):this.type = set(fitIntercept, value)setDefault(fitIntercept ->true)// 在训练模型前,是否对训练特征进行标准化。默认使用。// 模型的相关系数总是会返回原来的空间(不是标准化后的标准空间),所以这个过程对用户透明defsetStandardization(value: Boolean):this.type = set(standardization, value)setDefault(standardization ->true)// ElasticNet混合参数// 当改值为0时,使用L2惩罚;当该值为1时,使用L1惩罚;当值在(0,1)之间时,使用L1惩罚和L2惩罚的组合defsetElasticNetParam(value: Double):this.type = set(elasticNetParam, value)setDefault(elasticNetParam ->0.0)// 最大迭代次数,默认是100defsetMaxIter(value: Int):this.type = set(maxIter, value)setDefault(maxIter ->100)// 收敛阈值defsetTol(value: Double):this.type = set(tol, value)setDefault(tol ->1E-6)// 样本权重列的列名。默认不设置。当不设置时,样本权重为1defsetWeightCol(value: String):this.type = set(weightCol, value)// 最优化求解方法。实际有l-bfgs和带权最小二乘两种求解方法。// 当特征列数量超过4096时,默认使用l-bfgs求解,否则使用带权最小二乘求解。defsetSolver(value: String):this.type = { require(Set("auto", "l-bfgs", "normal").contains(value), s"Solver $value was not supported. Supported options: auto, l-bfgs, normal") set(solver, value) }setDefault(solver ->"auto")// 设置treeAggregate的深度。默认情况下深度为2// 当特征维度较大或者分区较多时,可以调大该深度defsetAggregationDepth(value: Int):this.type = set(aggregationDepth, value)setDefault(aggregationDepth ->2)
// Extract the number of features before deciding optimization solver.valnumFeatures= dataset.select(col($(featuresCol))).first().getAs[Vector](0).sizevalw=if (!isDefined(weightCol) || $(weightCol).isEmpty) lit(1.0) else col($(weightCol))valinstances:RDD[Instance] = dataset.select( col($(labelCol)), w, col($(featuresCol))).rdd.map { caseRow(label: Double, weight: Double, features: Vector) =>Instance(label, weight, features) // 标签,权重,特征向量}
if (($(solver) =="auto"&& numFeatures <=WeightedLeastSquares.MAX_NUM_FEATURES) || $(solver) =="normal") { valoptimizer=newWeightedLeastSquares($(fitIntercept), $(regParam), elasticNetParam = $(elasticNetParam), $(standardization), true, solverType =WeightedLeastSquares.Auto, maxIter = $(maxIter), tol = $(tol)) valmodel= optimizer.fit(instances) // When it is trained by WeightedLeastSquares, training summary does not// attach returned model.vallrModel= copyValues(newLinearRegressionModel(uid, model.coefficients, model.intercept)) val (summaryModel, predictionColName) = lrModel.findSummaryModelAndPredictionCol() valtrainingSummary=newLinearRegressionTrainingSummary( summaryModel.transform(dataset), predictionColName, $(labelCol), $(featuresCol), summaryModel, model.diagInvAtWA.toArray, model.objectiveHistory) return lrModel.setSummary(Some(trainingSummary))}
- 1 统计样本指标
val (featuresSummarizer, ySummarizer) = { valseqOp= (c: (MultivariateOnlineSummarizer, MultivariateOnlineSummarizer), instance: Instance) => (c._1.add(instance.features, instance.weight), c._2.add(Vectors.dense(instance.label), instance.weight)) valcombOp= (c1: (MultivariateOnlineSummarizer, MultivariateOnlineSummarizer), c2: (MultivariateOnlineSummarizer, MultivariateOnlineSummarizer)) => (c1._1.merge(c2._1), c1._2.merge(c2._2)) instances.treeAggregate( newMultivariateOnlineSummarizer,newMultivariateOnlineSummarizer )(seqOp, combOp, $(aggregationDepth))}
valcoefficients=Vectors.sparse(numFeatures, Seq()) // 系数为空valintercept= yMeanvalmodel= copyValues(newLinearRegressionModel(uid, coefficients, intercept))
// if y is constant (rawYStd is zero), then y cannot be scaled. In this case// setting yStd=abs(yMean) ensures that y is not scaled anymore in l-bfgs algorithm.valyStd=if (rawYStd >0) rawYStd else math.abs(yMean) valfeaturesMean= featuresSummarizer.mean.toArray valfeaturesStd= featuresSummarizer.variance.toArray.map(math.sqrt) valbcFeaturesMean= instances.context.broadcast(featuresMean) valbcFeaturesStd= instances.context.broadcast(featuresStd) valeffectiveRegParam= $(regParam) / yStd valeffectiveL1RegParam= $(elasticNetParam) * effectiveRegParam valeffectiveL2RegParam= (1.0- $(elasticNetParam)) * effectiveRegParam
- 2 定义损失函数
valcostFun=newLeastSquaresCostFun(instances, yStd, yMean, $(fitIntercept), $(standardization), bcFeaturesStd, bcFeaturesMean, effectiveL2RegParam, $(aggregationDepth))
LeastSquaresCostFun
继承自DiffFunction[T]
,用于表示最小二乘损失。它返回一个点L2正则化后的损失和梯度。 它使用方法def calculate(coefficients: BDV[Double]): (Double, BDV[Double])
计算损失和梯度。这里coefficients
表示一个特定的点。overridedefcalculate(coefficients: BDV[Double]): (Double, BDV[Double]) = { valcoeffs=Vectors.fromBreeze(coefficients) valbcCoeffs= instances.context.broadcast(coeffs) vallocalFeaturesStd= bcFeaturesStd.value valleastSquaresAggregator= { valseqOp= (c: LeastSquaresAggregator, instance: Instance) => c.add(instance) valcombOp= (c1: LeastSquaresAggregator, c2: LeastSquaresAggregator) => c1.merge(c2) instances.treeAggregate( newLeastSquaresAggregator(bcCoeffs, labelStd, labelMean, fitIntercept, bcFeaturesStd, bcFeaturesMean))(seqOp, combOp, aggregationDepth) } valtotalGradientArray= leastSquaresAggregator.gradient.toArray //梯度 bcCoeffs.destroy(blocking =false) valregVal=if (effectiveL2regParam ==0.0) { 0.0 } else { varsum=0.0 coeffs.foreachActive { (index, value) =>// 下面的代码计算正则化项的损失和梯度,并将梯度添加到totalGradientArray中 sum += { if (standardization) { totalGradientArray(index) += effectiveL2regParam * value value * value } else { if (localFeaturesStd(index) !=0.0) { // 如果`standardization`为false,我们仍然标准化数据加快收敛速度。获得的结果,我们需要执行反标准化// ,来得到正确的目标函数valtemp= value / (localFeaturesStd(index) * localFeaturesStd(index)) totalGradientArray(index) += effectiveL2regParam * temp value * temp } else { 0.0 } } } } 0.5* effectiveL2regParam * sum } (leastSquaresAggregator.loss + regVal, newBDV(totalGradientArray)) }
这里LeastSquaresAggregator
用来计算最小二乘损失函数的梯度和损失。为了在优化过程中提高收敛速度,防止大方差 的特征在训练时产生过大的影响,将特征缩放到单元方差并且减去均值,可以减少条件数。当使用截距进行训练时,处在缩放后空间的目标函数 如下:
$$ \begin{align} L &= 1/2N ||\sum_i w_i(x_i - \bar{x_i}) / \hat{x_i} - (y - \bar{y}) / \hat{y}||^2 \end{align} $$
在这个公式中,$\bar{x_i}$是$x_i$的均值,$\hat{x_i}$是$x_i$的标准差,$\bar{y}$是标签的均值,$\hat{y}$ 是标签的标准差。
如果不使用截距,我们可以使用同样的公式。不同的是$\bar{y}$和$\bar{x_i}$分别用0代替。这个公式可以重写为如下的形式。
$$ \begin{align} L &= 1/2N ||\sum_i (w_i/\hat{x_i})x_i - \sum_i (w_i/\hat{x_i})\bar{x_i} - y / \hat{y} + \bar{y} / \hat{y}||^2 \\ &= 1/2N ||\sum_i w_i^\prime x_i - y / \hat{y} + offset||^2 = 1/2N diff^2 \end{align} $$
在这个公式中,$w_i^\prime$是有效的相关系数,通过$w_i/\hat{x_i}$计算。offset
是$- \sum_i (w_i/\hat{x_i})\bar{x_i} + \bar{y} / \hat{y}$, 而diff
是$\sum_i w_i^\prime x_i - y / \hat{y} + offset$。
注意,相关系数和offset
不依赖于训练数据集,所以它们可以提前计算。
现在,目标函数的一阶导数如下所示:
$$ \begin{align} \frac{\partial L}{\partial w_i} &= diff/N (x_i - \bar{x_i}) / \hat{x_i} \end{align} $$
然而,$(x_i - \bar{x_i})$是一个密集的计算,当训练数据集是稀疏的格式时,这不是一个理想的公式。通过添加一个稠密项 $\bar{x_i} / \hat{x_i}$到 公式的末尾可以解决这个问题。目标函数的一阶导数如下所示:
$$ \begin{align} \frac{\partial L}{\partial w_i} &=1/N \sum_j diff_j (x_{ij} - \bar{x_i}) / \hat{x_i} \\ &= 1/N ((\sum_j diff_j x_{ij} / \hat{x_i}) - diffSum \bar{x_i} / \hat{x_i}) \\ &= 1/N ((\sum_j diff_j x_{ij} / \hat{x_i}) + correction_i) \end{align} $$
这里,$correction_i = - diffSum \bar{x_i} / \hat{x_i}$。通过一个简单的数学推导,我们就可以知道diffSum
实际上为0。
$$ \begin{align} diffSum &= \sum_j (\sum_i w_i(x_{ij} - \bar{x_i}) / \hat{x_i} - (y_j - \bar{y}) / \hat{y}) \\ &= N * (\sum_i w_i(\bar{x_i} - \bar{x_i}) / \hat{x_i} - (\bar{y} - \bar{y}) / \hat{y}) \\ &= 0 \end{align} $$
所以,目标函数的一阶导数仅仅依赖于训练数据集,我们可以简单的通过分布式的方式来计算,并且对稀疏格式也很友好。
$$ \begin{align} \frac{\partial L}{\partial w_i} &= 1/N ((\sum_j diff_j x_{ij} / \hat{x_i}) \end{align} $$
我们首先看有效系数$w_i/\hat{x_i}$和offset
的实现。
@transient privatelazyvaleffectiveCoefAndOffset= { valcoefficientsArray= bcCoefficients.value.toArray.clone() //系数,表示公式中的wvalfeaturesMean= bcFeaturesMean.value varsum=0.0vari=0vallen= coefficientsArray.length while (i < len) { if (featuresStd(i) !=0.0) { coefficientsArray(i) /= featuresStd(i) sum += coefficientsArray(i) * featuresMean(i) } else { coefficientsArray(i) =0.0 } i +=1 } valoffset=if (fitIntercept) labelMean / labelStd - sum else0.0 (Vectors.dense(coefficientsArray), offset) }
defadd(instance: Instance):this.type = { instance match { caseInstance(label, weight, features) =>if (weight ==0.0) returnthis// 计算diffvaldiff= dot(features, effectiveCoefficientsVector) - label / labelStd + offset if (diff !=0) { vallocalGradientSumArray= gradientSumArray vallocalFeaturesStd= featuresStd features.foreachActive { (index, value) =>if (localFeaturesStd(index) !=0.0&& value !=0.0) { localGradientSumArray(index) += weight * diff * value / localFeaturesStd(index) // 见公式(11) } } lossSum += weight * diff * diff /2.0//见公式(3) } totalCnt +=1 weightSum += weight this}defmerge(other: LeastSquaresAggregator):this.type = { if (other.weightSum !=0) { totalCnt += other.totalCnt weightSum += other.weightSum lossSum += other.lossSum vari=0vallocalThisGradientSumArray=this.gradientSumArray vallocalOtherGradientSumArray= other.gradientSumArray while (i < dim) { localThisGradientSumArray(i) += localOtherGradientSumArray(i) i +=1 } } this }
最后,根据下面的公式分别获取损失和梯度。
defloss:Double= { lossSum / weightSum } defgradient:Vector= { valresult=Vectors.dense(gradientSumArray.clone()) scal(1.0/ weightSum, result) result }
- 3 选择最优化方法
valoptimizer=if ($(elasticNetParam) ==0.0|| effectiveRegParam ==0.0) { newBreezeLBFGS[BDV[Double]]($(maxIter), 10, $(tol)) } else { valstandardizationParam= $(standardization) defeffectiveL1RegFun= (index: Int) => { if (standardizationParam) { effectiveL1RegParam } else { // If `standardization` is false, we still standardize the data// to improve the rate of convergence; as a result, we have to// perform this reverse standardization by penalizing each component// differently to get effectively the same objective function when// the training dataset is not standardized.if (featuresStd(index) !=0.0) effectiveL1RegParam / featuresStd(index) else0.0 } } newBreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, effectiveL1RegFun, $(tol)) }
- 4 获取结果,并做相应转换
valinitialCoefficients=Vectors.zeros(numFeatures) valstates= optimizer.iterations(newCachedDiffFunction(costFun), initialCoefficients.asBreeze.toDenseVector) val (coefficients, objectiveHistory) = { valarrayBuilder= mutable.ArrayBuilder.make[Double] varstate: optimizer.State=nullwhile (states.hasNext) { state = states.next() arrayBuilder += state.adjustedValue } // 从标准空间转换到原来的空间valrawCoefficients= state.x.toArray.clone() vari=0vallen= rawCoefficients.length while (i < len) { rawCoefficients(i) *= { if (featuresStd(i) !=0.0) yStd / featuresStd(i) else0.0 } i +=1 } (Vectors.dense(rawCoefficients).compressed, arrayBuilder.result()) } // 系数收敛之后,intercept的计算可以通过封闭(`closed form`)的形式计算出来,详细的讨论如下:// http://stats.stackexchange.com/questions/13617/how-is-the-intercept-computed-in-glmnetvalintercept=if ($(fitIntercept)) { yMean - dot(coefficients, Vectors.dense(featuresMean)) } else { 0.0 }
- 分类和回归(四)-线性回归
- 线性回归, 逻辑回归和线性分类器
- 线性回归, 逻辑回归和线性分类器
- 树分类、线性回归和树回归的感性认知
- 分类:线性回归
- 机器学习--线性回归和分类
- 分类和回归(一)-线性模型
- (理解)线性回归, 逻辑回归和线性分类器,Softmax回归。
- 线性回归和logistic回归
- 线性回归和逻辑回归
- 线性回归和逻辑回归
- 线性回归和逻辑回归
- 线性回归和岭回归
- 线性回归和逻辑回归
- 线性回归,logistic回归和一般回归
- 线性回归,logistic回归和一般回归
- 线性回归,logistic回归和一般回归
- 线性回归,logistic回归和一般回归
- CSDN邀请您来GitChat赚钱啦!
- python编程练习2-----条件判断学习
- Apache 修改并发请求数
- STM32平台下对串行通信梳理
- android中message、messageQueue、Handler、looper的关系
- 分类和回归(四)-线性回归
- 利用Support-Annotations提高代码质量
- LeetCode-268:Missing Number(寻找缺失数字)
- Unity_异步加载场景
- Qt4.7开发吐槽(二)
- 手风琴特效练习
- 正则表达式转义特殊字符
- java8中Collectors的使用方法举例和Function<T,R>简介
- Centos7安装Zabbix3.4