《统计学习方法》学习笔记(三)——K近邻法
来源:互联网 发布:淘宝乐高回力车 编辑:程序博客网 时间:2024/05/18 03:56
K近邻法对于已标记类别,在新的实例样本进行分类时,根据离其最近的K个训练样本实例,统计每类的相应的个数,通过多数表决等方式进行预测。举个最简单的例子,就是当K=1时,就是我们所熟悉的最近邻方法(NN)。
首先,我们需要判断离新的实例样本最近的K个训练样本,确定距离度量的准则,我们举出一个通用的模型:
当p=2时,称为欧氏距离;当p=1时,称为曼哈顿距离;当p=
然后,就是K值的选取,K值过小的话,系统越复杂,易产生过拟合;K值过大的话,远处的点也会被算进去,对结果产生影响。故K值通常选取一个比较小的数值,通常采用交叉验证选取合适的值。
最后,就是分类决策模型的选取,一般选取对应数量多的类别作为最终分类结果。
下面是一段大牛写的KNN实现程序,大家可以参考着学习下:
function rate = KNN(Train_data,Train_label,Test_data,Test_label,k,Distance_mark);% K-Nearest-Neighbor classifier(K-NN classifier)%Input:% Train_data,Test_data are training data set and test data% set,respectively.(Each row is a data point)% Train_label,Test_label are column vectors.They are labels of training% data set and test data set,respectively.% k is the number of nearest neighbors% Distance_mark : ['Euclidean', 'L2'| 'L1' | 'Cos'] % 'Cos' represents Cosine distance.%Output:% rate:Accuracy of K-NN classifier%% Examples:% % %Classification problem with three classes% A = rand(50,300);% B = rand(50,300)+2;% C = rand(50,300)+3;% % label vector for the three classes% gnd = [ones(300,1);2*ones(300,1);3*ones(300,1)];% fea = [A B C]';% trainIdx = [1:150,301:450,601:750]';% testIdx = [151:300,451:600,751:900]';% fea_Train = fea(trainIdx,:);% gnd_Train = gnd(trainIdx);% fea_Test = fea(testIdx,:);% gnd_Test = gnd(testIdx);% rate = KNN(fea_Train,gnd_Train,fea_Test,gnd_Test,1)%%%%Reference:%% If you used my matlab code, we appreciate it very much if you can cite our following papers:% Jie Gui, Tongliang Liu, Dacheng Tao, Zhenan Sun, Tieniu Tan, "Representative Vector Machines: A unified framework for classical classifiers", IEEE % Transactions on Cybernetics (Accepted).% Jie Gui et al., "Group sparse multiview patch alignment framework with view consistency for image classification", IEEE Transactions on Image Processing, vol. 23, no. 7, pp. 3126-3137, 2014% Jie Gui et al., "How to estimate the regularization parameter for spectral regression% discriminant analysis and its kernel version?", IEEE Transactions on Circuits and % Systems for Video Technology, vol. 24, no. 2, pp. 211-223, 2014% Jie Gui, Zhenan Sun, Wei Jia, Rongxiang Hu, Yingke Lei and Shuiwang Ji, "Discriminant% Sparse Neighborhood Preserving Embedding for Face Recognition", Pattern Recognition, % vol. 45, no.8, pp. 2884–2893, 2012% Jie Gui, Wei Jia, Ling Zhu, Shuling Wang and Deshuang Huang, % "Locality Preserving Discriminant Projections for Face and Palmprint Recognition," % Neurocomputing, vol. 73, no.13-15, pp. 2696-2707, 2010% Jie Gui et al., "Semi-supervised learning with local and global consistency", % International Journal of Computer Mathematics (Accepted)% Jie Gui, Shu-Lin Wang, and Ying-ke Lei, "Multi-step Dimensionality Reduction and % Semi-Supervised Graph-Based Tumor Classification Using Gene Expression Data," % Artificial Intelligence in Medicine, vol. 50, no.3, pp. 181-191, 2010%This code is written by Gui Jie in the evening 2009/03/11.%If you have find some bugs in the codes, feel free to contract meif nargin < 5 error('Not enought arguments!');elseif nargin < 6 Distance_mark='L2';end[n dim] = size(Test_data);% number of test data settrain_num = size(Train_data, 1); % number of training data set% Normalize each feature to have zero mean and unit variance.% If you need the following four rows,you can uncomment them.% M = mean(Train_data); % mean & std of the training data set% S = std(Train_data);% Train_data = (Train_data - ones(train_num, 1) * M)./(ones(train_num, 1) * S); % normalize training data set% Test_data = (Test_data-ones(n,1)*M)./(ones(n,1)*S); % normalize dataU = unique(Train_label); % class labelsnclasses = length(U);%number of classesResult = zeros(n, 1);Count = zeros(nclasses, 1);dist=zeros(train_num,1);for i = 1:n % compute distances between test data and all training data and % sort them test=Test_data(i,:); for j=1:train_num train=Train_data(j,:);V=test-train; switch Distance_mark case {'Euclidean', 'L2'} dist(j,1)=norm(V,2); % Euclead (L2) distance case 'L1' dist(j,1)=norm(V,1); % L1 distance case 'Cos' dist(j,1)=acos(test*train'/(norm(test,2)*norm(train,2))); % cos distance otherwise dist(j,1)=norm(V,2); % Default distance end end [Dummy Inds] = sort(dist); % compute the class labels of the k nearest samples Count(:) = 0; for j = 1:k ind = find(Train_label(Inds(j)) == U); %find the label of the j'th nearest neighbors Count(ind) = Count(ind) + 1; end% Count:the number of each class of k nearest neighbors % determine the class of the data sample [dummy ind] = max(Count); Result(i) = U(ind);endcorrectnumbers=length(find(Result==Test_label));rate=correctnumbers/n;
上面是最简单的KNN实现程序,但是不是最有效率的实现方法,其中kd树的KNN实现方法,暂时还没有实现,后续会进行补充。
0 0
- 《统计学习方法》学习笔记(三)——K近邻法
- 统计学习方法笔记(三):K近邻法
- 统计学习方法第三章笔记——k近邻法
- 统计学习方法——k近邻法
- 统计学习方法笔记(三)K近邻算法
- 三、k近邻法--统计学习方法总结
- 统计学习方法笔记(3)——k近邻法与kd树
- 统计学习方法笔记(3)——k近邻法与kd树
- 统计学习方法阅读笔记:k近邻法
- 统计学习方法笔记:K近邻法
- 《统计学习方法》笔记(3):k近邻
- 《统计学习方法》笔记(四)--k近邻法
- 《统计学习方法》笔记——K近邻模型
- 《统计学习方法》学习笔记--k近邻法及常用的距离(or 相似度)度量
- 统计学习方法——K近邻模型
- [统计学习方法]K近邻法
- 统计学习方法-----k近邻法
- 统计学习方法:k近邻法
- Jquery according
- C++标准库中队列的应用
- JAVA正则表达式语法大全
- mysql无法启动ERROR! MySQL is running but PID file could not be found ?
- hibernate一对多
- 《统计学习方法》学习笔记(三)——K近邻法
- NJUPT 微机 中断系统 日时钟中断之替换1CH实现字符串动态显示
- 如果你是IT技术人员,请思考这15个问题
- 分享自己的一些android util 源文件
- 1085. Perfect Sequence
- iOS9下App Store新应用提审攻略
- Android 蓝牙BLE (蓝牙成长之路)1
- 千万级到10亿+的疯涨,搜狗商业平台服务化体系实践之路
- samba简单配置