Supervised Learning 002: k-Nearest Neighbor
来源:互联网 发布:新浪财经怎么下载数据 编辑:程序博客网 时间:2024/06/06 09:45
Here we will explain the classify algorithm :
Pseudocode:
get the input data inX which is determined 获取输入待判定数据
for every piece data of our example data set: 针对样本集中的每一条数据
calculate the distance between inX and the current piece data计算这条数据与输入数据的距离
sort the distances in increasing order 对所有的距离升序排序
take k items with lowest distances to inX 取最顶部的k条记录
find the majority class among these items 找到最主要的分类
return the majority class as our prediction for the class of inX 返回分类
python code:
# inX: 输入数据, 一般是数组,需要注意列数必须和dataSet的列数一致,列数代表对象的特征值# dataSet:样本数据, 一般是矩阵(多维数组)# labels:样本数据对应的类别标签数组,有多少条数据,就有多少条对应的标签# k:最顶部的k条记录def classify0(inX, dataSet, labels, k):# shape()得到矩阵的维度(行,列),第一个是行数 dataSetSize = dataSet.shape[0]# tile()把inX从一个向量(数组)复制多次变为一个矩阵,重复次数和dataSet的行数一样,参见:详细用法# 此时inX变为一个行数和列数均和dataSet一样多的矩阵# diffMat为两个矩阵减法后的差值矩阵 diffMat = tile(inX, (dataSetSize,1)) – dataSet# 矩阵平方等于矩阵个中每个元素的平方 sqDiffMat = diffMat**2# sum(axis=1)代表每一行各元素相加,得到的是一个向量/数组(行数未dataSet行数),参见:详细用法 sqDistances = sqDiffMat.sum(axis=1)# 数组的每个元素开方 distances = sqDistances**0.5# argsort()对数组从小到大按数值排序, 参见:详细用法 sortedDistIndicies = distances.argsort()# label 分类的dict,key为label name, value为label出现的次数 classCount={}# 取数组中最top的k个元素 for i in range(k):# 取出label voteIlabel = labels[sortedDistIndicies[i]]# 此label的引用计数加1 classCount[voteIlabel] = classCount.get(voteIlabel,0) + 1# sorted()对label 分类的dict进行排序, 按其value降序排序,参见:详细用法# 得到是tuple组成的list,形如 [(key,value),(key,value)]# 针对本例,key为label名, value为次数sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)# 返回数组第一个元素所对应的分类label return sortedClassCount[0][0]How to use it?
from numpy import *
import operator# create the example data set include the data and class labeldef createDataSet(): group = array([[1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1]]) labels = ['A', 'A', 'B', 'B'] return group, labelsif __name__ == '__main__':# create example data set group, labels = createDataSet()# the test data for determined testData = [[0, 0], [0.8, 0.8], [0.5, 0.5], [0.6, 0.5]] for inX in testData:# get the class of each piece of test data print inX, kNN.classify0(inX, group, labels, 3)
The output is:
[0, 0] B[0.8, 0.8] A[0.5, 0.5] B[0.6, 0.5] A
阅读全文
0 0
- Supervised Learning 002: k-Nearest Neighbor
- Supervised Learning 001: k-Nearest Neighbor
- Supervised Learning 003: k-Nearest Neighbor
- Machine Learning in action:k-Nearest Neighbor
- k-Nearest Neighbor algorithm
- K Nearest Neighbor 算法
- K Nearest Neighbor算法
- K Nearest Neighbor 算法
- K Nearest Neighbor 算法
- K-Nearest Neighbor Algorithm
- K NEAREST NEIGHBOR 算法
- K-nearest neighbor algorithm
- algo_KNN(k-nearest neighbor)
- K-NN(k-nearest neighbor)
- KNN(K-Nearest Neighbor)
- KNN(K-nearest neighbor)理解
- k-Nearest Neighbor 算法详解
- python记录 k-nearest neighbor
- 操作系统概念(高等教育出版社,第七版)复习——第一章:导论
- 新闻客户端
- C++之this指针
- [树的点分治] [POJ2114] Boatherds
- SpringMVC实现服务器端推送
- Supervised Learning 002: k-Nearest Neighbor
- Python中map函数
- Server与Cilent
- 线程的分离和结合
- requirejs 教程(一)
- loadrunner 接口测试实例:天气接口,get&post
- git reset 简介
- vb.net 教程 11-1 打印组件 2 PrintDialog 2
- 计蒜客 2017 复赛 腾讯消消乐 状压dp