使用LDA线性判别分析进行多类的训练分类

来源：互联网发布：东方网络复牌编辑：程序博客网时间：2024/06/06 15:00

本文使用LDA作为分类器在matlab下做实验。

　　其中投影转换矩阵W按照LDA的经典理论生成，如下的LDA函数，并返回各个类的投影后的（ｋ－１）维的类均值。

LDA.m代码如下：

function [W,centers]=LDA(Input,Target)% Ipuut:    n*d matrix,each row is a sample;% Target:   n*1 matrix,each is the class label % W:        d*(k-1) matrix,to project samples to (k-1) dimention% cneters:  k*(k-1) matrix,the means of each after projection % 初始化[n dim]=size(Input);ClassLabel=unique(Target);k=length(ClassLabel);nGroup=NaN(k,1);            % group countGroupMean=NaN(k,dim);       % the mean of each valueW=NaN(k-1,dim);             % the final transfer matrixcenters=zeros(k,k-1);       % the centers of mean after projectionSB=zeros(dim,dim);          % 类间离散度矩阵SW=zeros(dim,dim);          % 类内离散度矩阵% 计算类内离散度矩阵和类间离散度矩阵for i=1:k        group=(Target==ClassLabel(i));    nGroup(i)=sum(double(group));    GroupMean(i,:)=mean(Input(group,:));    tmp=zeros(dim,dim);    for j=1:n        if group(j)==1            t=Input(j,:)-GroupMean(i,:);            tmp=tmp+t'*t;        end    end    SW=SW+tmp;endm=mean(GroupMean);    for i=1:k    tmp=GroupMean(i,:)-m;    SB=SB+nGroup(i)*tmp'*tmp;end% % W 变换矩阵由v的最大的K-1个特征值所对应的特征向量构成% v=inv(SW)*SB;% [evec,eval]=eig(v);% [x,d]=cdf2rdf(evec,eval);% W=v(:,1:k-1);% 通过SVD也可以求得% 对K=(Hb,Hw)'进行奇异值分解可以转换为对Ht进行奇异值分解.P再通过K,U,sigmak求出来% [P,sigmak,U]=svd(K,'econ');=>[U,sigmak,V]=svd(Ht,0);[U,sigmak,V]=svd(SW,0);t=rank(SW);R=sigmak(1:t,1:t);P=SB'*U(:,1:t)*inv(R);[Q,sigmaa,W]=svd(P(1:k,1:t))Y(:,1:t)=U(:,1:t)*inv(R)*W;W=Y(:,1:k-1);% 计算投影后的中心值for i=1:k    group=(Target==ClassLabel(i));    centers(i,:)=mean(Input(group,:)*W);end    

　　因为LDA是二类分类器，需要推广到多类的问题。常用的方法one-vs-all方法训练K个分类器（这个方法在综合时不知道怎么处理？），以及任意两个分类配对训练分离器最后得到k(k-1)/2个的二类分类器。本文采用训练后者对样本进行训练得到模型model。在代码中，model为数组struct。

用于训练的函数LDATraining.m

function [model,k,ClassLabel]=LDATraining(input,target)% input:        n*d matrix,representing samples% target:       n*1 matrix,class label% model:        struct type(see codes below)% k:            the total class number          % ClassLabel:   the class name of each class%model=struct;[n dim]=size(input);ClassLabel=unique(target);k=length(ClassLabel);t=1;for i=1:k-1    for j=i+1:k        model(t).a=i;        model(t).b=j;        g1=(target==ClassLabel(i));        g2=(target==ClassLabel(j));        tmp1=input(g1,:);        tmp2=input(g2,:);        in=[tmp1;tmp2];        out=ones(size(in,1),1);        out(1:size(tmp1,1))=0;%         tmp3=target(g1);%         tmp4=target(g2);%         tmp3=repmat(tmp3,length(tmp3),1);%         tmp4=repmat(tmp4,length(tmp4),1);%         out=[tmp3;tmp4];        [w m]=LDA(in,out);        model(t).W=w;        model(t).means=m;        t=t+1;    endend

　　在预测时，使用训练时生成的模型进行k(k-1)/2次预测，最后选择最多的分类作为预测结果。在处理二类分类器预测时，通过对预测样本作W的投影变换再比较与两个类的均值进行比较得到（不知道有没有更好的办法？）

用于预测的函数LDATesting.m

function target=LDATesting(input,k,model,ClassLabel)% input:        n*d matrix,representing samples% target:       n*1 matrix,class label% model:        struct type(see codes below)% k:            the total class number          % ClassLabel:   the class name of each class[n dim]=size(input);s=zeros(n,k);target=zeros(n,1);for j=1:k*(k-1)/2    a=model(j).a;    b=model(j).b;    w=model(j).W;    m=model(j).means;    for i=1:n        sample=input(i,:);        tmp=sample*w;        if norm(tmp-m(1,:))<norm(tmp-m(2,:))            s(i,a)=s(i,a)+1;        else            s(i,b)=s(i,b)+1;        end    endendfor i=1:n    pos=1;    maxV=0;    for j=1:k        if s(i,j)>maxV            maxV=s(i,j);            pos=j;        end    end    target(i)=ClassLabel(pos);end

示例代码为：

function target=test(in,out,t)[model,k,ClassLabel]=LDATraining(in,out);target=LDATesting(t,k,model,ClassLabel);

　　实验中对USPS数据集进行了测试，效果不怎么好，正确率才39%左右，而这个数据集使用KNN算法可以达到百分之百九十的正确率，汗！

0 0