机器学习实战之k-近邻算法(5)--- 完整版约会网站数据分类
来源:互联网 发布:汽车销售app软件 编辑:程序博客网 时间:2024/05/20 09:08
from numpy import *import operator#创建数据集def createDataSet(): group = array([[1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1]]) labels = ['A', 'A', 'B', 'B'] return group, labels#根据输入测试实例进行k-近邻分类def classify0(inX, dataSet, labels, k): dataSetSize = dataSet.shape[0] diffMat = tile(inX, (dataSetSize, 1)) - dataSet sqDiffMat = diffMat ** 2 sqDistances = sqDiffMat.sum(axis=1) distances = sqDistances**0.5 sortedDistIndicies = distances.argsort() classCount = {} for i in range(k): voteIlabel = labels[sortedDistIndicies[i]] classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1 sortedClassCount = sorted(classCount.iteritems(), key = operator.itemgetter(1), reverse=True) return sortedClassCount[0][0]#处理输入格式问题,从txt文件中读取数据def file2matrix(filename, dim2): fr = open(filename) arrayOLines = fr.readlines() numberOfLines = len(arrayOLines) returnMat = zeros((numberOfLines, dim2)) classLabelVector = [] index = 0 for line in arrayOLines: line = line.strip() listFromLine = line.split('\t') returnMat[index, :] = listFromLine[0:dim2] classLabelVector.append(int(listFromLine[-1])) index += 1 return returnMat, classLabelVector#归一化特征值def autoNorm(dataSet): minVals = dataSet.min(0) maxVals = dataSet.max(0) ranges = maxVals - minVals normDataSet = zeros(shape(dataSet)) m = dataSet.shape[0] normDataSet = dataSet - tile(minVals, (m, 1)) normDataSet = normDataSet / tile(ranges, (m, 1)) return normDataSet, ranges, minVals#分类器针对约会网站的测试代码 hoRatio是测试样本占总样本数的比例def datingClassTest(hoRatio): datingDataMat, datingLabels = file2matrix('datingTestSet2.txt', 3) normMat, ranges, minVals = autoNorm(datingDataMat) m = normMat.shape[0] numTestVecs = int(m * hoRatio) errorCount = 0.0 for i in range(numTestVecs): classifierResult = classify0(normMat[i, :], normMat[numTestVecs:m, :], datingLabels[numTestVecs:m], 3) print "the classifier came back with: %d, the real answer is: %d" %(classifierResult, datingLabels[i]) if(classifierResult != datingLabels[i]): errorCount += 1.0 print "the total error rate is: %f" %(errorCount/float(numTestVecs))
当把测试数据占的比例设为0.1是,错误率仅为0.05
>>> import kNN>>> reload(kNN)
>>> kNN.datingClassTest(0.1)the classifier came back with: 3, the real answer is: 3the classifier came back with: 2, the real answer is: 2the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 3, the real answer is: 3the classifier came back with: 3, the real answer is: 3the classifier came back with: 1, the real answer is: 1the classifier came back with: 3, the real answer is: 3the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 2, the real answer is: 2the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 2, the real answer is: 2the classifier came back with: 3, the real answer is: 3the classifier came back with: 2, the real answer is: 2the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 2the classifier came back with: 3, the real answer is: 3the classifier came back with: 2, the real answer is: 2the classifier came back with: 3, the real answer is: 3the classifier came back with: 2, the real answer is: 2the classifier came back with: 3, the real answer is: 3the classifier came back with: 2, the real answer is: 2the classifier came back with: 1, the real answer is: 1the classifier came back with: 3, the real answer is: 3the classifier came back with: 1, the real answer is: 1the classifier came back with: 3, the real answer is: 3the classifier came back with: 1, the real answer is: 1the classifier came back with: 2, the real answer is: 2the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 2, the real answer is: 2the classifier came back with: 3, the real answer is: 3the classifier came back with: 3, the real answer is: 3the classifier came back with: 1, the real answer is: 1the classifier came back with: 2, the real answer is: 2the classifier came back with: 3, the real answer is: 3the classifier came back with: 3, the real answer is: 3the classifier came back with: 3, the real answer is: 3the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 2, the real answer is: 2the classifier came back with: 2, the real answer is: 2the classifier came back with: 1, the real answer is: 1the classifier came back with: 3, the real answer is: 3the classifier came back with: 2, the real answer is: 2the classifier came back with: 2, the real answer is: 2the classifier came back with: 2, the real answer is: 2the classifier came back with: 2, the real answer is: 2the classifier came back with: 3, the real answer is: 3the classifier came back with: 1, the real answer is: 1the classifier came back with: 2, the real answer is: 2the classifier came back with: 1, the real answer is: 1the classifier came back with: 2, the real answer is: 2the classifier came back with: 2, the real answer is: 2the classifier came back with: 2, the real answer is: 2the classifier came back with: 2, the real answer is: 2the classifier came back with: 2, the real answer is: 2the classifier came back with: 3, the real answer is: 3the classifier came back with: 2, the real answer is: 2the classifier came back with: 3, the real answer is: 3the classifier came back with: 1, the real answer is: 1the classifier came back with: 2, the real answer is: 2the classifier came back with: 3, the real answer is: 3the classifier came back with: 2, the real answer is: 2the classifier came back with: 2, the real answer is: 2the classifier came back with: 3, the real answer is: 1the classifier came back with: 3, the real answer is: 3the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 3, the real answer is: 3the classifier came back with: 3, the real answer is: 3the classifier came back with: 1, the real answer is: 1the classifier came back with: 2, the real answer is: 2the classifier came back with: 3, the real answer is: 3the classifier came back with: 3, the real answer is: 1the classifier came back with: 3, the real answer is: 3the classifier came back with: 1, the real answer is: 1the classifier came back with: 2, the real answer is: 2the classifier came back with: 2, the real answer is: 2the classifier came back with: 1, the real answer is: 1the classifier came back with: 1, the real answer is: 1the classifier came back with: 3, the real answer is: 3the classifier came back with: 2, the real answer is: 3the classifier came back with: 1, the real answer is: 1the classifier came back with: 2, the real answer is: 2the classifier came back with: 1, the real answer is: 1the classifier came back with: 3, the real answer is: 3the classifier came back with: 3, the real answer is: 3the classifier came back with: 2, the real answer is: 2the classifier came back with: 1, the real answer is: 1the classifier came back with: 3, the real answer is: 1the total error rate is: 0.050000
0 0
- 机器学习实战之k-近邻算法(5)--- 完整版约会网站数据分类
- 《机器学习实战》第二章:k-近邻算法(2)约会对象分类
- Python3:《机器学习实战》之k近邻算法(2)我们约会吧
- 《机器学习实战》学习笔记——K-近邻算法(KNN)(二)海伦约会网站匹配实战
- 机器学习之k近邻算法——5、约会网站的配对开发流程
- 读懂《机器学习实战》代码—K-近邻算法改进约会网站配对效果
- 【机器学习实战02】使用k-近邻算法改进约会网站的配对效果
- 《机器学习实战》第二章 2.2用k-近邻算法改进约会网站的配对效果
- 机器学习实战—k近邻算法(kNN)02-改进约会网站的配对效果
- 机器学习实战——K-近邻算法【2:改进约会网站配对效果】
- 『机器学习实战』使用 k-近邻算法改进约会网站的配对效果
- 机器学习实战笔记-K近邻算法2(改进约会网站的配对效果)
- 机器学习实践-k近邻算法-约会网站配对源码
- 机器学习(二):分类算法之k-近邻算法
- 机器学习实战之K-近邻算法
- 《机器学习实战》之K-近邻算法
- 机器学习实战之k-近邻算法
- 机器学习实战之K近邻算法
- Eclipse中10个最有用的快捷键组合
- 小马哥-----高仿米3系列刷机拆机主板与开机界面展示, 版本很多。注意区分
- c progrmming language gets函数
- 第14周项目2-带姓名的成绩单-(3)
- Sqlite 数据库恢复技术,源代码出售
- 机器学习实战之k-近邻算法(5)--- 完整版约会网站数据分类
- WCF-Address
- POJ3046 Ant Counting 【母函数】
- java Reflection 反射
- SDUT 2894 最短路(SPFA or Bleman)
- 风中奇缘
- Ubuntu-安装配置Mysql
- linux操作的常用指令
- 四舍五入精确算法 遇到5(有时)不进位的问题