kaggle 训练赛(1)Digit Recognizer
来源:互联网 发布:开眼mac版下载 编辑:程序博客网 时间:2024/05/16 05:40
题目
识别手写数字
做法
开始做kaggle的第一套题,识别手写数字。每个数字是28*28的一个向量,朴素的跑了一个KNN,距离用的是欧几里得距离。最终成绩0.96
def knn(inX,num): dataSet = trainMat labels = labelList k = 3 dataSetSize = dataSet.shape[0] diffMat = tile(inX,(dataSetSize,1)) - dataSet sqDiffMat = diffMat**2 sumDiffMat = sqDiffMat.sum(axis=1) distances = sumDiffMat**0.5 sortedDistances = distances.argsort() classCount = {} for i in range(k): vote = labels[sortedDistances[i]] classCount[vote] = classCount.get(vote,0) + 1 # sortedClassCount = sorted(classCount,key=itemgetter('vote')) max = 0 ans = '' for k,v in classCount.items(): if(v>max): ans = k max = v print(str(num+1) + ' = ' + ans) outFile.write(str(num+1) + ',' + ans + '\n') return
以后学到更多的知识再做优化。
在写法上,用上了Python的多线程来处理,节省了一定的时间
from multiprocessing.dummy import Pool outFile = open("out2.csv",'w') pool = Pool() pool.starmap(knn,zip(testMat,range(n))) pool.close() pool.join() outFile.close()
代码
from numpy import *import csvfrom multiprocessing.dummy import Pooldef knn_warp(args): return knn(*args)def knn(inX,num): dataSet = trainMat labels = labelList k = 3 dataSetSize = dataSet.shape[0] diffMat = tile(inX,(dataSetSize,1)) - dataSet sqDiffMat = diffMat**2 sumDiffMat = sqDiffMat.sum(axis=1) distances = sumDiffMat**0.5 sortedDistances = distances.argsort() classCount = {} for i in range(k): vote = labels[sortedDistances[i]] classCount[vote] = classCount.get(vote,0) + 1 # sortedClassCount = sorted(classCount,key=itemgetter('vote')) max = 0 ans = '' for k,v in classCount.items(): if(v>max): ans = k max = v print(str(num+1) + ' = ' + ans) outFile.write(str(num+1) + ',' + ans + '\n') returndef readTrain(row,i): labelList[i] = row['label'] for x in range(0, 784): trainMat[i, x] = int(row['pixel' + str(x)]) print(str(i))def readTest(row,i): for x in range(0, 784): testMat[i, x] = int(row['pixel' + str(x)]) print(str(i))global labelListglobal trainMatglobal outFileif __name__ == '__main__': f = open('train.csv') m = len(f.readlines()) m = m - 1 labelList = list(range(m)) trainMat = zeros((m,784)) f.close() with open('train.csv') as f: f_csv = csv.DictReader(f) pool = Pool() pool.starmap(readTrain, zip(f_csv, range(m))) pool.close() pool.join() f = open('test.csv') n = len(f.readlines()) n = n - 1 testMat = zeros((n,784)) f.close() with open('test.csv') as f: f_csv = csv.DictReader(f) pool = Pool() pool.starmap(readTest, zip(f_csv, range(n))) pool.close() pool.join() outFile = open("out2.csv",'w') pool = Pool() pool.starmap(knn,zip(testMat,range(n))) pool.close() pool.join() outFile.close()
0 0
- kaggle 训练赛(1)Digit Recognizer
- 【Kaggle练习赛】之Digit Recognizer
- Kaggle入门赛之Digit Recognizer
- Digit Recognizer (Kaggle)
- Kaggle | Digit Recognizer
- Kaggle入门:Digit Recognizer
- Kaggle入门:Digit Recognizer
- kaggle | Digit Recognizer
- Kaggle—Digit Recognizer竞赛
- Digit Recognizer Kaggle 竞赛系列
- kaggle | Digit recognizer with caffe
- Kaggle digit-recognizer PCA+SVM
- kaggle——Digit Recognizer
- kaggle Digit Recognizer 数字识别
- Kaggle练习赛-digit recognizer-kNN解法全部实现步骤
- Kaggle项目Digit Recognizer实现(一):三层卷积神经网络
- Kaggle项目Digit Recognizer实现(二):caffe by python
- 关于kaggle上的digit recognizer
- CCF201612-1中间数
- Java & OpenCV (一) ——环境配置及简单程序编写
- 解决Mybatis配置ORM映射使用javaType=Date.class时候时分秒都为0
- Java小工具
- 1.16 (2015 7)
- kaggle 训练赛(1)Digit Recognizer
- ngTable自定义/重写过滤器
- CCF201612-2工资计算
- 112. Path Sum
- MongoDB安装,配置,运行
- SOAP的消息体
- Mockito 改写DataSource,从List中查询数据
- 而立之年
- SSM框架——以注解形式实现事务管理,回滚数据库操作