k近邻算法识别手写数字Python实现

来源:互联网 发布:凤凰金融 大数据 编辑:程序博客网 时间:2024/04/25 18:57

数据集:(参见python实战教程)

训练数据:trainingDigits 2000多个.txt文件

测试数据:testDigits  约900个.txt文件

均为32*32大小

test_handWritting.py:

from numpy import *import osimport knnOperatorimport pdbdef img2vector(filename,d): #d=32    returnVector = zeros((1,d*d))    fr = open(filename)    for i in range(d):        linstr = fr.readline()        for j in range(d):            returnVector[0,i*d+j] = int(linstr[j])    return returnVector    def handwritingClassTest(filepath,d):    trainFilePath = filepath + 'trainingDigits\\'    trainFileList = os.listdir(trainFilePath)    nTrain = len(trainFileList)    trainData = zeros((nTrain,d*d))    trainlabels = []    for i in range(nTrain):        trainFilei = trainFileList[i]        trainFileName = trainFilePath + trainFilei        vector = img2vector(trainFileName,d)        trainData[i,:] = vector        trainFileClass = trainFilei.split('_')[0]        trainlabels.append(trainFileClass)           testFilePath = filepath + 'testDigits\\'    testFileList = os.listdir(testFilePath)    nTest = len(testFileList)    k = 4    count = 0    for j in range(nTest):        #pdb.set_trace()        testFilej = testFileList[j]        testFileName = testFilePath + testFilej        testSample = img2vector(testFileName,d)        test_label = knnOperator.knnOperator(testSample,trainData,trainlabels,k)        truth_label = testFilej.split('_')[0]        if (truth_label == test_label):            count += 1    rate = float(count) / float(nTest)    print rate

knnOperator函数参见:http://blog.csdn.net/u013593585/article/details/51284537

主实现:

import test_handWrittingfilepath = 'E:\\ZForWorks\\MLPython\\knn\\digits\\'d = 32handwritingClassTest(filepath,d)
准确率:98.3%



0 0
原创粉丝点击