KNN算法的总结
来源:互联网 发布:阿里云访问页面很慢 编辑:程序博客网 时间:2024/05/23 19:39
kNN是一种非常有效且简单的分类算法,我用了三个小时把其理论和实践都作了一遍,在这里总结一下。
一: kNN算法的一般流程
(1)收集数据:使用任何方法,你多能想到的,这是第一步
(2)准备数据:距离计算所需要的数值,最好是结构化的数据格式
(3)分析数据:使用任何方法
(4)训练算法:kNN就不用了
(5)测试算法:计算错误率
(6)使用算法:就是用你自己的数据来测试
二:kNN算法的实践
1:首先建立一个kNN.py的脚本文件,里面主要有两个函数组成,一个是用来生成小数据库,另一个是用来分类的
- # kNN: k Nearest Neighbors
- # Input: newInput: vector to compare to existing dataset (1xN)
- # dataSet: size m data set of known vectors (NxM)
- # labels: data set labels (1xM vector)
- # k: number of neighbors to use for comparison
- # Output: the most popular class label
- #########################################
- from numpy import *
- import operator
- # create a dataset which contains 4 samples with 2 classes
- def createDataSet():
- # create a matrix: each row as a sample
- group = array([[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]])
- labels = ['A', 'A', 'B', 'B'] # four samples and two classes
- return group, labels
- # classify using kNN
- def kNNClassify(newInput, dataSet, labels, k):
- numSamples = dataSet.shape[0] # shape[0] stands for the num of row
- ## step 1: calculate Euclidean distance
- # tile(A, reps): Construct an array by repeating A reps times
- # the following copy numSamples rows for dataSet
- diff = tile(newInput, (numSamples, 1)) - dataSet # Subtract element-wise
- squaredDiff = diff ** 2 # squared for the subtract
- squaredDist = sum(squaredDiff, axis = 1) # sum is performed by row
- distance = squaredDist ** 0.5
- ## step 2: sort the distance
- # argsort() returns the indices that would sort an array in a ascending order
- sortedDistIndices = argsort(distance)
- classCount = {} # define a dictionary (can be append element)
- for i in xrange(k):
- ## step 3: choose the min k distance
- voteLabel = labels[sortedDistIndices[i]]
- ## step 4: count the times labels occur
- # when the key voteLabel is not in dictionary classCount, get()
- # will return 0
- classCount[voteLabel] = classCount.get(voteLabel, 0) + 1
- ## step 5: the max voted class will return
- maxCount = 0
- for key, value in classCount.items():
- if value > maxCount:
- maxCount = value
- maxIndex = key
- return maxIndex
2.我们接下来需要进行测试
- import kNN
- from numpy import *
- dataSet, labels = kNN.createDataSet()
- testX = array([1.2, 1.0])
- k = 3
- outputLabel = kNN.kNNClassify(testX, dataSet, labels, 3)
- print "Your input is:", testX, "and classified to class: ", outputLabel
- testX = array([0.1, 0.3])
- outputLabel = kNN.kNNClassify(testX, dataSet, labels, 3)
- print "Your input is:", testX, "and classified to class: ", outputLabel
- Your input is: [ 1.2 1.0] and classified to class: A
- Your input is: [ 0.1 0.3] and classified to class: B
阅读全文
0 0
- KNN算法的总结
- KNN算法总结
- KNN算法总结
- KNN算法 总结
- KNN算法总结
- KNN算法学习总结
- KNN近邻算法总结
- knn算法的介绍
- KNN算法的实现
- KNN算法的实现
- kNN算法的优缺点
- KNN算法的实现
- KNN的一些总结
- KNN的一些总结
- KNN算法的實現
- KNN的matlab实现算法
- KNN算法的个人理解
- KNN算法的Python实现
- OpenStack计费项目Cloudkitty系列详解
- iOS解决使用模态视图 导致无法pushViewController
- 基于负采样的skip-garm的语言模型实现-R
- Linux下chkconfig命令详解
- 死锁的出现条件
- KNN算法的总结
- Spring_03_Bean 延迟加载
- tf.trarin
- springboot(4) 项目中全局异常的处理
- 智能指针简介
- 漫步最优化十二——局部极小与极大的充分必要条件(下)
- (Swift) iOS Apps with REST APIs(三) -- 使用Alamofire和SwiftyJSON进行REST API调用
- 实现一个JSP项目,要不断积累错误,记录错误类型
- python--write video