python 逻辑回归 程序解析

来源:互联网 发布:360端口查看器 编辑:程序博客网 时间:2024/04/30 00:52

python《机器学习实战》逻辑回归部分,用全部样本多次进行梯度上升的程序如下:

# coding=utf-8__author__ = 'Administrator'from numpy import *#从文本中加载数据,文档中保存了100个坐标为X,Y的数据def loadDataSet():    dataMat = []; labelMat = []    fr = open('testSet.txt')    for line in fr.readlines():        lineArr = line.strip().split()        dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])   #将数据维度进行了拓展,第一维全部设置为1.0,第二维和第三维是原文本文档中的数据        labelMat.append(int(lineArr[2]))  #标签    return dataMat,labelMat# sigmoid 函数运算def sigmoid(inX):    return 1.0/(1+exp(-inX))#梯度下降法def gradAscent(dataMatIn, classLabels):    dataMatrix = mat(dataMatIn)             #convert to NumPy matrix    labelMat = mat(classLabels).transpose() #convert to NumPy matrix    m,n = shape(dataMatrix)   #get the rows and cols of the data    alpha = 0.001             #rate    maxCycles = 500           #biggest cycle times    weights = ones((n,1))    for k in range(maxCycles):              #heavy on matrix operations        h = sigmoid(dataMatrix*weights)     #matrix multiply        error = (labelMat - h)              #vector subtraction        weights = weights + alpha * dataMatrix.transpose()* error #matrix mult    return weights#画出最佳的拟合直线def plotBestFit(weights):    import matplotlib.pyplot as plt    dataMat,labelMat=loadDataSet()    dataArr = array(dataMat)    n = shape(dataArr)[0]    #get the rows of the data in fact is the samples    xcord1 = []; ycord1 = []    xcord2 = []; ycord2 = []    for i in range(n):       #two kinds data draw        if int(labelMat[i])== 1:            xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2])        else:            xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2])    fig = plt.figure()    ax = fig.add_subplot(111)    ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')    ax.scatter(xcord2, ycord2, s=30, c='green')    # the line x range    x = arange(-3.0, 3.0, 0.1)    #画出直线,weights[0]*1.0+weights[1]*x+weights[2]*y=0    #之前计算时对原始数据做了拓展,将两维拓展为三维,第一维全部设置为1.0    y = (-weights[0]-weights[1]*x)/weights[2]    ax.plot(x, y)    plt.xlabel('X1'); plt.ylabel('X2');    plt.show()
直接运行如下代码可以画出结果:

    import logRegres    dataArr,labelMat =logRegres.loadDataSet()    weights=logRegres.gradAscent(dataArr,labelMat)    logRegres.plotBestFit(weights.getA())
上述程序中有很多人不太明白weights.getA()这句是什么意思,调试时,如果直接print weights和print weights.getA()

会发现输出结果是一样的,但是如果将程序改为logRegres.plotBestfit(weights)会发现程序出错

原因就在于,python的科学计算库numpy中定义了一种ndarray,这种数组是一种描述性数组,比如:

x = np.matrix(np.arange(12).reshape((3,4))); xmatrix([[ 0,  1,  2,  3],        [ 4,  5,  6,  7],        [ 8,  9, 10, 11]])
对这种描述性数组用getA()得到的结果是其本身,但是在程序执行过程中调用机制是不一样的

x.getA()array([[ 0,  1,  2,  3],       [ 4,  5,  6,  7],       [ 8,  9, 10, 11]])
由于在定义weights时是采用weights=ones(n,3)

进而需要在后续调用时加上getA()函数,以免出错





0 0
原创粉丝点击