回归中的overfittingunderfitting,正则化回归python

来源：互联网发布：网红经济学数据编辑：程序博客网时间：2024/05/21 19:27

Adressing overfitting:

减少特征
模型选择, 自动选择变量
但是特征信息的舍弃会导致信息的丢失

regularization:

保留所有特征, 但是减少参数theta的值
在很多特征时有良好的效果

cost function

对参数惩罚, 保证参数较小, 防止过拟合
1. fitting well
2. theta is small

这里写图片描述

这里的lambda参数设置过大会underfitting

正则化回归

这里写图片描述

正则化回归中的只惩罚非常数项所以, 将梯度下降分开:
这里写图片描述

Normal equation

正则化通过在对角加上一个数值, 可是解决不可逆的问题.
这里写图片描述

逻辑回归正则化

无正则化的逻辑回归的cost function
这里写图片描述

正则化的cost
这里写图片描述

梯度下降的式子与线性的相同, 不同的是h(theta)函数不同

其损失函数为:
这里写图片描述

整个迭代过程为:
这里写图片描述

__author__ = 'Chen'from numpy import *#calculate the costdef costFunction(X,Y,theta):    mse = (theta * X.T - Y.T)    return mse *mse.T#linearReresiondef linearRegresion(x,y,type=True,alpha=0.01,lambdas=0.01):    xrow = shape(x)[0]    xcol = shape(x)[1]    x = matrix(x)    Y = matrix(y)    # fill ones    xone = ones((xrow,1))    X = hstack((xone,x))    X = matrix(X)    # normal equiation    if type == True:        #add regularization        for i in range(1,xrow):            X[i,i] += lambdas * 1        theta = (X.T*X).I*X.T*Y        return theta    else:        # gradiant        theta = matrix(random.random(xcol+1))        # iterations        for iteration in range(1,10000):                # return the cost                print costFunction(X,Y,theta)                sums = 0                #gradient method                # adding a regularzation need to add theta(i-1)                temptheta = theta                temptheta[0,0] = 0                for i in range(1, xrow):                    sums += (theta*X[i,:].T-Y[i,:])*X[i,:]                theta -= alpha*sums/xrow + lambdas * temptheta/xrow        return thetax= [[0,1,0],[0,0,1],[0,1,1],[1,1,1]]y= [[1],[2],[3],[4]]# calculate linearRegression by normal equationtheta1 = linearRegresion(x,y)print theta1#gradient descenttheta2 = linearRegresion(x,y,False)print theta2

0 0