Machine Learning in action --逻辑回归(已勘误)
来源:互联网 发布:西瑞鞋子怎么样知乎 编辑:程序博客网 时间:2024/05/20 13:10
最近在自学机器学习,应导师要求,先把《Machine Learning with R》动手刷了一遍,感觉R真不能算是一门计算机语言,感觉也就是一个功能复杂的计算器。所以这次就决定使用经典教材《Machine Learning in action》。因为开学得换work station ,怕到时候代码又丢了,所以就索性开个博客,把代码上传上来。
因为书上的原代码有很多错误,并且网上的许多博客的代码也是没有改正的,这次我把修正过的代码po上来
version:python3.5
talk is cheap show me the code
函数定义代码
#coding=utf-8from numpy import *def loadDataSet(): dataMat = [] labelMat = [] fr = open("testSet.txt") lines = fr.readlines() for line in lines : lineArr = line.strip().split() #第一个特征为固定为 1 dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])]) labelMat.append(int(lineArr[2])) return dataMat, labelMatdef sigmoid(inX): sig = 1.0/(1 + exp(-inX)) return sigdef gradAscent(dataMatIn , classMatIn): dataMatrix = mat(dataMatIn) labelMat = mat(classMatIn).transpose() m, n = shape(dataMatrix) alpha = 0.01 maxCycle = 500 weights = ones((n, 1)) for k in range(maxCycle): h = sigmoid(dataMatrix * weights) error = (labelMat - h) weights = weights + alpha * dataMatrix.transpose() * error return weightsdef plotBestFit(wei): import matplotlib.pyplot as plt weights = wei #weights = wei.getA() dataMat, labelMat = loadDataSet() dataArr = array(dataMat) n = shape(dataArr)[0] xcord1 = []; ycord1 = [] xcord2 = []; ycord2 = [] for i in range(n): if int(labelMat[i]) == 1: xcord1.append(dataArr[i, 1]) ycord1.append(dataArr[i, 2]) else: xcord2.append(dataArr[i, 1]) ycord2.append(dataArr[i, 2]) fig = plt.figure() ax = fig.add_subplot(111) ax.scatter(xcord1, ycord1, s = 30, c = 'red', marker = 's') ax.scatter(xcord2, ycord2, s = 30, c = 'green') x = arange(-3.0, 3.0, 0.1) #最佳拟合直线 y = (-weights[0] - weights[1] * x )/weights[2] ax.plot(x, y) plt.xlabel("X1"); plt.ylabel("X2") plt.show()def stocGradAscent0(dataMatrix, classLabels): m, n = shape(dataMatrix) alpha = 0.01 weights = ones(n) for i in range(m): h = sigmoid(sum(dataMatrix[i] * weights)) error = classLabels[i] - h weights = weights + alpha * error * dataMatrix[i] return weightsdef stocGradAscent1(dataMatrix, classLabels, numIter = 150 ): m, n = shape(dataMatrix) weights = ones(n) for j in range(numIter): dataIndex = list(range(m)) for i in range(m): alpha = 4/(10.+i+j) +0.01 randIndex = int(random.uniform(0, len(dataIndex))) h = sigmoid(sum(dataMatrix[randIndex] * weights)) #print(type(classLabels[randIndex])) error = float(classLabels[randIndex]) - h weights = weights + alpha * error * dataMatrix[dataIndex[randIndex]] del(dataIndex[randIndex]) return weightsdef classifyVector(inX ,weights): prob = sigmoid(sum(inX * weights)) if prob >0.5: return 1.0 else: return 0.0def colicTest(): frTrain = open('horseColicTraining.txt') frTest = open('horseColicTest.txt') trainingSet = [] ; trainingLabels = [] for line in frTrain.readlines(): currLine = line.strip().split('\t') lineArr = [] for i in range(21): lineArr.append(float(currLine[i])) trainingSet.append(lineArr) trainingLabels.append(currLine[21]) trainWeights = stocGradAscent1(array(trainingSet), trainingLabels, 500) errorCount = 0 numTestVec = 0.0 for line in frTest.readlines(): numTestVec += 1.0 currLine = line.strip().split('\t') lineArr = [] for i in range(21): lineArr.append(float(currLine[i])) if int(classifyVector(array(lineArr), trainWeights)) != int(currLine[21]): errorCount += 1 errorRate = float(errorCount) / numTestVec print("the error rate of this test is: %f" % errorRate) return errorRatedef multiTest(): numTests = 10 errorSum = 0.0 for k in range(numTests): errorSum += colicTest() print("after %d iterations the average error rate is :%f "%(numTests, errorSum / float(numTests)))
上面代码块只是定义了主要的函数,离运行还差一点。由于书原文中,采用了使用 iPython 命令行的运行方式,但是博主比较懒,所以干脆舍弃掉原来的方式。
废话不多少,直接上代码
实验1
if __name__=="__main__": dataArr, labelMat = loadDataSet() #gradAscent(dataArr, labelMat) print(gradAscent(dataArr, labelMat))
实验2 :
if __name__ == "__main__": dataArr, labelMat = loadDataSet() plotBestFit(gradAscent(dataArr, labelMat))
实验3 :
if __name__ == "__main__": dataArr, labelMat = loadDataSet() weights = stocGradAscent0(array(dataArr), labelMat) plotBestFit(weights)
实验4 :
if __name__ == "__main__": dataArr, labelMat = loadDataSet() weights = stocGradAscent1(array(dataArr), labelMat) plotBestFit(weights)
实验5 :
if __name__ == "__main__": multiTest()
更多请戳github
https://github.com/Edgis/Machine-learning-in-action/blob/master/logRegres.py
阅读全文
0 0
- Machine Learning in action --逻辑回归(已勘误)
- Machine Learning in action --AdaBoost(已勘误)
- Machine Learning in action --regression(已勘误)
- Machine Learning in action –kNN(已勘误)
- Machine Learning in action --朴素贝叶斯(已勘误)
- Machine Learning In Action-Chapter8 线性回归
- 《Machine Learning in Action》 读书笔记之四:逻辑回归(logistic regression)
- Machine Learning in Action 学习笔记-(5)Logistic回归
- 学习Machine Leaning In Action(四):逻辑回归
- 学习Machine Leaning In Action(四):逻辑回归
- machine learning in action
- Machine Learning in Action
- Machine Learning In Action
- Machine Learning In Action
- Machine Learning In Action
- Machine Learning In Action
- Machine Learning In Action
- Machine Learning In Action
- Python操作MySQL
- Android单元测试-对View的测试
- Java8系列之重新认识HashMap
- HDU6127-Hard challenge
- JavaScript各类轮播图(二)
- Machine Learning in action --逻辑回归(已勘误)
- 二分匹配
- node解决异步问题三种方案
- wamp一些mysql配置问题
- 一些经典的常用例子(随时更新)
- tensorflow_api_4:tf.equal( )
- C# gridView 使用右键菜单
- hdu 6127 Hard challenge(计算几何)
- 麻将游戏