python机器学习4—2代码详解及修改
来源:互联网 发布:短信免费发送软件 编辑:程序博客网 时间:2024/06/03 14:43
import urllib.requestimport numpyfrom sklearn import datasets, linear_modelfrom math import sqrtimport matplotlib.pyplot as plot#从网页中读取数据target_url = ("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv")data = urllib.request.urlopen(target_url)#将数据中第一行的属性读取出来放在names列表中,将其他行的数组读入row中,并将row中最后一列提取#出来放在labels中作为标签,并使用pop将该列从row去去除掉,最后将剩下的属性值转化为float类型存入xList中xList = []labels = []names = []firstLine = Truefor line in data: if firstLine == True: names = line.strip().split(";".encode(encoding='utf-8')) firstLine = False else: row = line.strip().split(";".encode(encoding='utf-8')) labels.append(float(row[-1])) row.pop() floatRow = [float(num) for num in row] xList.append(floatRow) #以下程序主要是求解每一列属性的平均值和标准差nrows = len(xList)ncols = len(xList[0])xMeans = []xSD = []for i in range(ncols): col = [xList[j][i] for j in range(nrows)] mean = sum(col) / nrows xMeans.append(mean) colDiff = [(xList[j][i] - mean) for j in range(nrows)] sumSq = sum([colDiff[i] * colDiff[i] for i in range(nrows)]) stdDev = sqrt(sumSq / nrows) xSD.append(stdDev)#通过上面程序求解的每一列属性的均值和标准差,接下来就可以对xList中的每一个元素进行归一化xNormalized = []for i in range(nrows): rowNormalized = [(xList[i][j] - xMeans[j]) / xSD[j] for j in range(ncols)] xNormalized.append(rowNormalized)#同样需要对标签中的数值进行标准化meanLabel = sum(labels) / nrowssdLabel = sqrt(sum((labels[i] - meanLabel) * (labels[i] - meanLabel) for i in range(nrows)) / nrows)labelNormalized = [(labels[i] - meanLabel) / sdLabel for i in range(nrows)]#各种初始化nxval = 10nSteps = 350stepSize = 0.004errors = []for i in range(nSteps): b = [] errors.append(b)#进行10折交叉验证for ixval in range(nxval): idxTest = [a for a in range(nrows) if a%nxval == ixval]#原代码是这样的idxTest = [a for a in range(nrows) if a%nxval == ixval*nxval] #这样只能进行反复的1折验证,后来我改进了代码,使其可以真正做到10折交叉验证 idxTrain = [a for a in range(nrows) if a%nxval != ixval]#同上 xTrain = [xNormalized[r] for r in idxTrain] xTest = [xNormalized[r] for r in idxTest] labelTrain = [labelNormalized[r] for r in idxTrain] labelTest = [labelNormalized[r] for r in idxTest] nrowsTrain = len(idxTrain) nrowsTest = len(idxTest) beta = [0.0] * ncols betaMat = [] betaMat.append(list(beta)) #进行350步的迭代,在每一次的交叉验证中都会进行350次的迭代,所以以下代码都将放在10折的交叉验证循环中,这样就可以得到10次交叉验证的最小平方误差和效果图 for iStep in range(nSteps): residuals = [0.0] * nrows for j in range(nrowsTrain): labelsHat = sum([xTrain[j][k] * beta[k] for k in range(ncols)]) residuals[j] = labelTrain[j] - labelsHat corr = [0.0] * ncols for j in range(ncols): corr[j] = sum([xTrain[k][j] * residuals[k] for k in range(nrowsTrain)]) / nrowsTrain#每一列属性的每一行元素与残差中对应的行元素相乘,然后得到属性的一列元素将其相加除以这列元素的行数 #最终得到的结果即为属性与残差的相关性,有j列属性,就得到j列相关性的值[ , , , , ] iStar = 0 corrStar = corr[0] for j in range(1, (ncols)): if abs(corrStar) < abs(corr[j]): iStar = j corrStar = corr[j] beta[iStar] += stepSize * corrStar / abs(corrStar) betaMat.append(list(beta)) for j in range(nrowsTest): labelsHat = sum([xTest[j][k] * beta[k] for k in range(ncols)]) err = labelTest[j] - labelsHat errors[iStep].append(err) #print("errors= ", errors) cvCurve = [] for errVect in errors: mse = sum([x*x for x in errVect]) / len(errVect) cvCurve.append(mse) mineMse = min(cvCurve) minPt = [i for i in range(len(cvCurve)) if cvCurve[i] == mineMse][0] print("Minimum Mean Square Error", mineMse) print("Index of Minimum Mean Square Error", minPt) xaxis = range(len(cvCurve)) plot.plot(xaxis, cvCurve) plot.xlabel("Steps Taken") plot.ylabel(("Mean Square Error")) plot.show()
输出结果:
Minimum Mean Square Error 0.5873018933136459Index of Minimum Mean Square Error 311Minimum Mean Square Error 0.5534955247726759Index of Minimum Mean Square Error 289Minimum Mean Square Error 0.5957385843236068Index of Minimum Mean Square Error 244Minimum Mean Square Error 0.6163846701751715Index of Minimum Mean Square Error 265Minimum Mean Square Error 0.6205467405536572Index of Minimum Mean Square Error 289Minimum Mean Square Error 0.6273690438035697Index of Minimum Mean Square Error 312Minimum Mean Square Error 0.6214330728517901Index of Minimum Mean Square Error 285Minimum Mean Square Error 0.6180113626794431Index of Minimum Mean Square Error 285Minimum Mean Square Error 0.6295047735731523Index of Minimum Mean Square Error 280Minimum Mean Square Error 0.6494495844086484Index of Minimum Mean Square Error 285
阅读全文
1 0
- python机器学习4—2代码详解及修改
- python机器学习及实战代码13-16,程序运行时出现提醒及修改
- python机器学习4-4代码及运行结果
- python机器学习4-5代码及运行结果
- python机器学习4-6代码及运行结果
- 机器学习----K-近邻算法(Python代码详解)
- 《python机器学习及实践》书籍代码练习
- Python机器学习-感知机原理及代码实现
- python机器学习4-3代码讲解
- 机器学习经典算法详解及Python实现–决策树
- Python机器学习及实践——简介篇2
- 《python机器学习及实践-从零开始通往kaggle竞赛之路(代码Python 3.6 版)》chapter2.1.1.4
- 《python机器学习及实践-从零开始通往kaggle竞赛之路(代码Python 3.6 版)》chapter2.1.1.2
- Python机器学习及实践:
- python机器学习及实践学习笔记2-编码问题
- 机器学习实战--第三版代码修改
- 针对python机器学习与实战代码在python3上运行出现的错误分析和warning的修改代码34—38
- 机器学习——感知器算法及python实现
- 将oracle字段,由varchar2类型变更为clob类型,报错
- MySQL索引实战汇总
- Linux 高级编程
- C#SSH.NET连接报错"value cannot be null All lists either null or empty"
- 使用android手机抓取windows的密码hash的实践
- python机器学习4—2代码详解及修改
- NAS与SAN架构:如何比较这两个存储系统?
- Pointers on C——9 Strings, Characters, and Bytes.5
- git忽略已经被提交的文件
- tomcat的一些优化
- 使用ssh在远程linux服务器上安装oracle
- Android Studio将library手动打成jar包
- 多线程第二章(1)
- mysql 5.7中文乱码的解决