聚类 K-means & K-medoids 算法

来源:互联网 发布:飞思卡尔单片机 编辑:程序博客网 时间:2024/05/19 13:18

关于K-means和K-medoids的描述,参见pluskid博客http://blog.pluskid.org/?tag=clustering或http://blog.csdn.net/abcjennifer/article/details/8197072

下面给出首先给出matlab关于K-means的matlab代码:

function [labels,Cnt] = kmeans(k,D,threshold=1e-10)%KMEANS Summary of this function goes here    %Detailed explanation goes here    N=length(D);    R_I = randperm(N,k);      Cnt = D(R_I,:);    %k Random cluster centre;    labels=zeros(N,1);    while(true)        dist=zeros(k,1);        for l=1:N            for i=1:k               dist(i)=norm(D(l,:)-Cnt(i,:));            end            [~,t]=min(dist);            labels(l)=t;        end        sum=zeros(k,2);        cont=zeros(k,1);        for l=1:N            sum(labels(l),:)=sum(labels(l),:)+D(l,:);            cont(labels(l),:)=cont(labels(l),:)+1;        end        for i=1:k            sum(i,:)=sum(i,:)/cont(i,:);        end        %average, and obtain new centres;        if norm(Cnt-sum)<threshold            break;        else            Cnt=sum;        end    endend
实验的数据采用三个高斯分布生成
% generate out Gaussian distribution samples;mu=[0,-15];sigma=[45 ,0;0,45];r1=mvnrnd(mu,sigma,300);mu=[5,15];sigma=[15 ,0;0,15];r2=mvnrnd(mu,sigma,300);mu=[-5,7];sigma=[15,0;0,15];r3=mvnrnd(mu,sigma,300);figure;plot(r1(:,1),r1(:,2),'r*',r2(:,1),r2(:,2),'b*',r3(:,1),r3(:,2),'g*');title('the generating data');D=[r1;r2;r3]

medoids算法要求计算centres的值在已有的数据点中,这样提高了鲁棒性,因此需要计算每一个点在该类中的距离:

function [labels,Cnt] = kmedoids(k,D,threshold)%KMEDOIDS Summary of this function goes here%   Detailed explanation goes here    N=length(D);    R_I = randperm(N,k);      Cnt = D(R_I,:);    %k Random cluster centre;    labels=zeros(N,1);    while(true)        dist=zeros(k,1);        for l=1:N            for i=1:k               dist(i)=norm(D(l,:)-Cnt(i,:));            end            [~,t]=min(dist);            labels(l)=t;        end        dist_mat=cell(k,1);        for s=1:k            dist_mat{s}=zeros(N,N);        end        for l=1:N            for p=l+1:N                if labels(l)~=labels(p)                    continue;                else                    dist_mat{labels(l)}(l,p)=norm(D(p,:)-D(l,:));                    dist_mat{labels(l)}(p,l)=dist_mat{labels(l)}(l,p);                end            end        end        Cnt_=D(R_I,:);        for s=1:k            temp=sum(dist_mat{s},1,'double');            [~,t]=min(temp);            minimal=realmax;            for l=1:N                if (minimal > temp(l)) & (labels(l)==s)                    minimal=temp(l);                    Cnt_(s,:)=D(l,:);                end            end        end        %average, and obtain new centres;        if norm(Cnt-Cnt_)<threshold            break;        else            Cnt=Cnt_;        end    endend



原创粉丝点击