机器学习实战之knn算法

来源:互联网 发布:淘宝350模板怎么买不了 编辑:程序博客网 时间:2024/06/13 11:22
程序:
# author: xiaoyun
from numpy import *
import operator
def createDataSet():
    group = array([[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]])
    labels = ['A', 'A', 'B', 'B']
    return group, labels

def kNNClassify(newInput, dataSet, labels, k):
    numSamples = dataSet.shape[0] 
    diff = tile(newInput, (numSamples, 1)) - dataSet
    squaredDiff = diff ** 2 
    squaredDist = sum(squaredDiff, axis=1)
    distance = squaredDist ** 0.5
    sortedDistIndices = argsort(distance)

    classCount = {}
    for i in range(k):
        voteLabel = labels[sortedDistIndices[i]]
        classCount[voteLabel] = classCount.get(voteLabel, 0) + 1

    ##  找到value最大的那个值,循环比较法。
    maxCount = 0
    for key, value in classCount.items():
        if value > maxCount:
            maxCount = value
            maxIndex = key

    return maxIndex

# 主程序,修改了一点
testX = array([1.2, 1.0])
k = 3
dataSet, labels =createDataSet()
outputLabel = kNNClassify(testX, dataSet, labels, 3)
print ("Your input is %s,and classified to class:%s " %(testX, outputLabel))
testX = array([0.1, 0.3])
outputLabel =   kNNClassify(testX, dataSet, labels, 3)
print ("Your input is:%s,and classified to class:%s "%(testX, outputLabel))


1·tile那儿将输入数据复制成一行四列的数组,然后减去goup中的值,2·argsort函数将得到的欧式距离排序后的索引按照数组输出(更节俭)

,3·classCount[voteLabel] = classCount.get(voteLabel, 0) + 1,这是个字典的赋值语句




0 0