KNN算法的总结

来源：互联网发布：阿里云访问页面很慢编辑：程序博客网时间：2024/05/23 19:39

kNN是一种非常有效且简单的分类算法，我用了三个小时把其理论和实践都作了一遍，在这里总结一下。

一： kNN算法的一般流程

（1）收集数据：使用任何方法，你多能想到的，这是第一步

（2）准备数据：距离计算所需要的数值，最好是结构化的数据格式

（3）分析数据：使用任何方法

（4）训练算法：kNN就不用了

（5）测试算法：计算错误率

（6）使用算法：就是用你自己的数据来测试

二：kNN算法的实践

1：首先建立一个kNN.py的脚本文件，里面主要有两个函数组成，一个是用来生成小数据库，另一个是用来分类的

# kNN: k Nearest Neighbors
# Input: newInput: vector to compare to existing dataset (1xN)
# dataSet: size m data set of known vectors (NxM)
# labels: data set labels (1xM vector)
# k: number of neighbors to use for comparison
# Output: the most popular class label
#########################################
from numpy import *
import operator
# create a dataset which contains 4 samples with 2 classes
def createDataSet():
# create a matrix: each row as a sample
group = array([[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]])
labels = ['A', 'A', 'B', 'B'] # four samples and two classes
return group, labels
# classify using kNN
def kNNClassify(newInput, dataSet, labels, k):
numSamples = dataSet.shape[0] # shape[0] stands for the num of row
## step 1: calculate Euclidean distance
# tile(A, reps): Construct an array by repeating A reps times
# the following copy numSamples rows for dataSet
diff = tile(newInput, (numSamples, 1)) - dataSet # Subtract element-wise
squaredDiff = diff ** 2 # squared for the subtract
squaredDist = sum(squaredDiff, axis = 1) # sum is performed by row
distance = squaredDist ** 0.5
## step 2: sort the distance
# argsort() returns the indices that would sort an array in a ascending order
sortedDistIndices = argsort(distance)
classCount = {} # define a dictionary (can be append element)
for i in xrange(k):
## step 3: choose the min k distance
voteLabel = labels[sortedDistIndices[i]]
## step 4: count the times labels occur
# when the key voteLabel is not in dictionary classCount, get()
# will return 0
classCount[voteLabel] = classCount.get(voteLabel, 0) + 1
## step 5: the max voted class will return
maxCount = 0
for key, value in classCount.items():
if value > maxCount:
maxCount = value
maxIndex = key
return maxIndex

2.我们接下来需要进行测试

import kNN
from numpy import *
dataSet, labels = kNN.createDataSet()
testX = array([1.2, 1.0])
k = 3
outputLabel = kNN.kNNClassify(testX, dataSet, labels, 3)
print "Your input is:", testX, "and classified to class: ", outputLabel
testX = array([0.1, 0.3])
outputLabel = kNN.kNNClassify(testX, dataSet, labels, 3)
print "Your input is:", testX, "and classified to class: ", outputLabel

会输出

Your input is: [ 1.2 1.0] and classified to class: A
Your input is: [ 0.1 0.3] and classified to class: B

这是kNN最基本的理论方法和简单实践，接下来我也将学习一下kNN的进阶知识

阅读全文

0 0