矩阵分解之最小二乘法ALS

来源:互联网 发布:ipad下载软件工具 编辑:程序博客网 时间:2024/03/29 22:13

矩阵分解

应用场景是:我们要把一个稀疏矩阵分解为两个低秩的矩阵相乘;两个低秩的矩阵除了降维之外,还分别代表不同的含义。
以推荐为例: 用户点击商品的关系矩阵R则是稀疏的,我们分解为两个低秩矩阵,分别代表用户特征矩阵U和商品特征矩阵I,这个特征则就是隐含的语义信息。
形式化的表述一下:
打分矩阵R是近似低秩的,即m*n的矩阵可以分解为:
U(m*k)和V(n*k)的乘积来近似:
其中U则是用户喜好特征矩阵;V则是产品特征矩阵。

RUVT,k<<m,n

那么如何求解U和V呢。解决这类问题,一般要先定义目标函数,有了目标函数,就可以用各种暴力搜索、启发式搜索、求导、梯度下降、二乘法等等来逼近。
那么我们的矩阵分解的损失函数(loss function)是:
矩阵分解的损失函数:
C=(i,j)[(RijuivTj)2+λ(u2i+v2j)]

我们的目标就是找到两个矩阵使得C最小。

求解方法

矩阵分解的损失函数:

C=(i,j)[(RijuivTj)2+λ(u2i+v2j)]

最小化损失函数的方法有:交替最小二乘法和随机梯度下降法;
交替最小二乘法,则是固定一个变量,通过求偏导得到优化另外一个变量的公式,不断的交替优化其中一个变量求极值,从而得到一个优化解。核心点有几个:

1. 两个变量,固定其中一个。对另外一个求极值。2. 固定A,对B求偏导,利用导数为0求极值。从而继续优化。

最小二乘法的流程

损失函数如上,求矩阵U和V

1 初始化U,固定U2 求导得到V的求解公式,然后得到V3 固定V4 求导得到U的求解方法,得到U5 如此反复1-4,直到C达到条件或者step走完。

几个核心点

  1. 最小二乘法的loss function是平方损失函数,这也是回归学习最常用的损失函数。所谓回归,则是利用某个h来拟合数据。h可以是线性的或者非线性的。

    回归问题的损失函数如果是平方损失函数,,可以由最小二乘法(least squares)求解。

  2. 最小二乘法的算法原理,固定一个变量,通过求偏导得到公式后,求极值;同样的套路用于另外一个变量。这种思想值得学习。

矩阵分解应用

如果得到了U/V,那么通过U和V的乘积,就可以得到R的稀疏填充。当然,U本身也能用于计算用户相似度,从而用于用户聚类。同理V也可以类似操作,从而实现“人以群分,物以类聚”的目标。

推导过程

第一部分是方差,第二部分是规则化项,用了Ui和Vj
最小二乘法则是固定一个求解另一个,那么里面就2个变量,那么则求偏导得到极小值。

vj求偏导:

d(c)d(vj)==[2(Rijuivj)(uTi)]+2λvj=0(1)

这个推导过程开始非常不理解,这里主要涉及转置矩阵的求导问题。举个例子就好了。如果对A^T关于A求导,应该先转置后才可以求导。所以才会出现

uTi

继续对公式1推导:

RijuTi=[uiuTi+λ]vj

从而推导出:
UTRj=(UUT+λE)Vj

Vj=((UUT)+λE)1(UTRj)

按照这个公式逐个计算V_j得到V矩阵;

类似的方法得到U矩阵。

Ui=((VVT)+λE)1(VTRi)

注意:其中Rj(1×n)是R的第j列,ri(1×m)是R的第i行

这里再细化其中的中间推导步骤:

i=1...mRijuTi=i=1...muTiRij=

=(uT1,uT2,...,uTm)R1jR2j...Rmj=UTR.j

随机梯度下降求解

对损失函数进行求偏导。
由于是求单个uij那么需要找到每个项的更新公式。
由上述的损失函数来定义:

E2i,j=(Rijk=1Kuikvkj)2+β2(k=1Kuik2+k=1Kv2kj)

我们需要找到uik的更新公式,那么求偏导:
E2ijuik=2(Rijk=1Kuikvkj)(vkj)+βuik=2Eij(vkj)+βuik=δu

E2ijvkj=2(Rijk=1Kuikvkj)(uik)+βvkj=2Eij(uik)+βvkj=δv

那么我们的更新则是负梯度方向:
负梯度方向,才能保证损失函数是减小的。
得到:
uik=uikαδuvkj=vkjαδv

python(包括最小二乘法和随机梯度下降)

import numpy as npdef loss(R ,P, Q, K, alpha=0.001, lambd=0.02):    #print R.shape,P.shape,Q.shape    #print R    #print P    #print Q    tempQ = Q.T    loss = 0    for i in xrange(len(R)):        for j in xrange(len(R[0])):            if R[i,j]>0:##no value is continue                #print "p[i,:]",P[i,:]                #print "Q[:,j]",tempQ[:,j]                loss +=pow((R[i,j]-np.dot(P[i,:],tempQ[:,j])),2)#(observe-predict)^2                for k in xrange(K):                    loss+=(lambd/2)*(pow(P[i][k],2)+pow(tempQ[k][j],2))    return lossdef gd(R,P,Q,K, alpha=0.001, lambd=0.02):    QT = Q.T    for ite in xrange(100):        for i in xrange(len(P)):            for j in xrange(len(QT)):                if R[i,j]>0:                    Rij = R[i,j]                    evalRij = 0                    for k in xrange(K):                        evalRij+=P[i,k]*QT[k,j]                    error = Rij - evalRij                    #update pik qkj                    for k in xrange(K):                        P[i,k] = P[i,k]+alpha*(2*error*QT[k,j]-lambd*P[i,k])                        QT[k,j] = QT[k,j]+alpha*(2*error*P[i,k]-lambd*QT[k,j])        ##caculate loss        #print P,QT        lossc = loss(R,P,QT.T,K,0.001,0.02)        print "gradient descent,iterate:",ite,"loss:",lossc    return P,Qdef als(R ,P, Q, K, alpha=0.001, lambd=0.02):    shape = (K,K)    earray = np.ones(shape) #array    E = np.mat(earray)   # print(np.eye(3,3))##dialog    print E    lenP =  len(P)    lenQ = len(Q)    steps = 100    for step in xrange(steps):        lossvalue = 12332423        for i in xrange(lenP):           #print i           #print Q.shape,(Q.T).shape           #print Q           M1 = np.dot((Q.T),Q)           M1 = lambd*E+M1           M1_1 = np.mat(M1).I           #print "M1-1:", M1_1.shape ##(2,2)           QT = Q.T           #print "QT:" , QT.shape ##(2,4)           Ri = np.mat(R[i,:]).T           #print "Ri:", Ri.shape ##(4,1)           Pi = np.dot(np.dot(M1_1,QT),Ri) ##(2,1)           #print Pi           P[i,:] =  Pi[:,0].T ## update the i-th           #print R           #print P           #print Q           #print "P[i]",P[i,:]          # print  Pi[:,0].T           #print Pi          #print R.shape,P.shape,Q.shape           lossvalue = loss(R,P,Q,K)          # print "loss:",loss(R ,P, Q, K,0.00,0.02)        #print "update User-Feature, loss:",lossvalue        for j in xrange(lenQ):            M1 = np.dot((P.T), P)            M1 = lambd * E + M1            M1_1 = np.mat(M1).I            #print "M1-1:", M1_1.shape  ##(2,2)            QT = P.T            #print "PT:", QT.shape  ##(2,5)            Ri = np.mat(R[:,j]).T            #print "Rj:", Ri.shape  ##(5,1)            Qi = np.dot(np.dot(M1_1, QT), Ri)  ##(2,1)            #print Qi            Q[j, :] = Qi[:, 0].T  ## update the i-th            # print "P[i]",P[i,:]            # print  Pi[:,0].T            # print Pi            # print R.shape,P.shape,Q.shape            lossvalue = loss(R, P, Q, K)            #print "RootMeanSE, loss:",lossvalue        #print "update Item-Feature"        if lossvalue<0.01:            break        print "als,iteration:", step,"loss:",lossvalue    return P,Qif __name__ == "__main__":    R = [     [5,3,0,1],     [4,0,0,1],     [1,1,0,5],     [1,0,0,4],     [0,1,5,4],    ]##(5,4,), k=2    R = np.array(R)    N, M, K = len(R), len(R[0]),2    print N,M,K #n=5,m=4,k=2    U = np.random.rand(N,K)    V = np.random.rand(M,K)    # print loss function before optimize    losst = loss(R,U,V,K,0.001,0.02)    # gradient descent    gd(R,U,V,K,0.001,0.02)    # als    P,Q = als(R,U, V, K, 0.001, 0.02)    # print np.dot(P,Q.T)    #als(R,P,Q,K,0.001,0.02)   # nP, nQ = mf_als(R, P, Q, K)    #print numpy.dot(nP, nQ.T)

运行结果

5 4 2gradient descent,iterate: 0 loss: 92.3719163472gradient descent,iterate: 1 loss: 92.0064419565gradient descent,iterate: 2 loss: 91.638861411gradient descent,iterate: 3 loss: 91.2691298415gradient descent,iterate: 4 loss: 90.8972092357gradient descent,iterate: 5 loss: 90.5230684137gradient descent,iterate: 6 loss: 90.1466830058gradient descent,iterate: 7 loss: 89.7680354342gradient descent,iterate: 8 loss: 89.3871148957gradient descent,iterate: 9 loss: 89.0039173456gradient descent,iterate: 10 loss: 88.6184454813gradient descent,iterate: 11 loss: 88.230708725gradient descent,iterate: 12 loss: 87.8407232053gradient descent,iterate: 13 loss: 87.4485117355gradient descent,iterate: 14 loss: 87.0541037888gradient descent,iterate: 15 loss: 86.6575354697gradient descent,iterate: 16 loss: 86.2588494798gradient descent,iterate: 17 loss: 85.8580950782gradient descent,iterate: 18 loss: 85.455328036gradient descent,iterate: 19 loss: 85.0506105826gradient descent,iterate: 20 loss: 84.644011345gradient descent,iterate: 21 loss: 84.2356052782gradient descent,iterate: 22 loss: 83.8254735868gradient descent,iterate: 23 loss: 83.4137036363gradient descent,iterate: 24 loss: 83.000388855gradient descent,iterate: 25 loss: 82.5856286247gradient descent,iterate: 26 loss: 82.1695281598gradient descent,iterate: 27 loss: 81.7521983756gradient descent,iterate: 28 loss: 81.3337557442gradient descent,iterate: 29 loss: 80.914322138gradient descent,iterate: 30 loss: 80.4940246613gradient descent,iterate: 31 loss: 80.0729954683gradient descent,iterate: 32 loss: 79.6513715696gradient descent,iterate: 33 loss: 79.2292946243gradient descent,iterate: 34 loss: 78.806910721gradient descent,iterate: 35 loss: 78.3843701448gradient descent,iterate: 36 loss: 77.9618271326gradient descent,iterate: 37 loss: 77.5394396156gradient descent,iterate: 38 loss: 77.1173689505gradient descent,iterate: 39 loss: 76.6957796387gradient descent,iterate: 40 loss: 76.2748390345gradient descent,iterate: 41 loss: 75.8547170432gradient descent,iterate: 42 loss: 75.4355858094gradient descent,iterate: 43 loss: 75.0176193952gradient descent,iterate: 44 loss: 74.6009934517gradient descent,iterate: 45 loss: 74.1858848816gradient descent,iterate: 46 loss: 73.772471496gradient descent,iterate: 47 loss: 73.3609316653gradient descent,iterate: 48 loss: 72.9514439659gradient descent,iterate: 49 loss: 72.5441868228gradient descent,iterate: 50 loss: 72.1393381505gradient descent,iterate: 51 loss: 71.7370749924gradient descent,iterate: 52 loss: 71.3375731602gradient descent,iterate: 53 loss: 70.9410068748gradient descent,iterate: 54 loss: 70.5475484097gradient descent,iterate: 55 loss: 70.1573677376gradient descent,iterate: 56 loss: 69.7706321829gradient descent,iterate: 57 loss: 69.3875060799gradient descent,iterate: 58 loss: 69.0081504386gradient descent,iterate: 59 loss: 68.6327226198gradient descent,iterate: 60 loss: 68.2613760192gradient descent,iterate: 61 loss: 67.8942597638gradient descent,iterate: 62 loss: 67.5315184194gradient descent,iterate: 63 loss: 67.1732917125gradient descent,iterate: 64 loss: 66.8197142655gradient descent,iterate: 65 loss: 66.4709153481gradient descent,iterate: 66 loss: 66.1270186443gradient descent,iterate: 67 loss: 65.7881420365gradient descent,iterate: 68 loss: 65.4543974072gradient descent,iterate: 69 loss: 65.125890459gradient descent,iterate: 70 loss: 64.8027205538gradient descent,iterate: 71 loss: 64.4849805704gradient descent,iterate: 72 loss: 64.1727567828gradient descent,iterate: 73 loss: 63.8661287575gradient descent,iterate: 74 loss: 63.5651692711gradient descent,iterate: 75 loss: 63.2699442482gradient descent,iterate: 76 loss: 62.980512719gradient descent,iterate: 77 loss: 62.6969267969gradient descent,iterate: 78 loss: 62.4192316759gradient descent,iterate: 79 loss: 62.1474656469gradient descent,iterate: 80 loss: 61.8816601336gradient descent,iterate: 81 loss: 61.6218397459gradient descent,iterate: 82 loss: 61.368022352gradient descent,iterate: 83 loss: 61.1202191673gradient descent,iterate: 84 loss: 60.8784348593gradient descent,iterate: 85 loss: 60.6426676692gradient descent,iterate: 86 loss: 60.4129095477gradient descent,iterate: 87 loss: 60.1891463043gradient descent,iterate: 88 loss: 59.971357771gradient descent,iterate: 89 loss: 59.7595179765gradient descent,iterate: 90 loss: 59.5535953326gradient descent,iterate: 91 loss: 59.3535528301gradient descent,iterate: 92 loss: 59.159348244gradient descent,iterate: 93 loss: 58.9709343462gradient descent,iterate: 94 loss: 58.788259126gradient descent,iterate: 95 loss: 58.6112660156gradient descent,iterate: 96 loss: 58.4398941211gradient descent,iterate: 97 loss: 58.2740784571gradient descent,iterate: 98 loss: 58.1137501841gradient descent,iterate: 99 loss: 57.9588368481[[ 1.  1.] [ 1.  1.]]als,iteration: 0 loss: 21.5978330148als,iteration: 1 loss: 17.3428069177als,iteration: 2 loss: 16.3502902377als,iteration: 3 loss: 16.2712840045als,iteration: 4 loss: 16.2936489968als,iteration: 5 loss: 16.3098592641als,iteration: 6 loss: 16.3174969147als,iteration: 7 loss: 16.3211261631als,iteration: 8 loss: 16.3231790031als,iteration: 9 loss: 16.32465131als,iteration: 10 loss: 16.3259224131als,iteration: 11 loss: 16.3271331615als,iteration: 12 loss: 16.3283352168als,iteration: 13 loss: 16.3295472843als,iteration: 14 loss: 16.3307759489als,iteration: 15 loss: 16.3320233491als,iteration: 16 loss: 16.3332899979als,iteration: 17 loss: 16.3345758181als,iteration: 18 loss: 16.3358805223als,iteration: 19 loss: 16.3372037526als,iteration: 20 loss: 16.3385451309als,iteration: 21 loss: 16.339904278als,iteration: 22 loss: 16.3412808197als,iteration: 23 loss: 16.3426743898als,iteration: 24 loss: 16.3440846298als,iteration: 25 loss: 16.3455111898als,iteration: 26 loss: 16.3469537282als,iteration: 27 loss: 16.348411911als,iteration: 28 loss: 16.3498854124als,iteration: 29 loss: 16.3513739138als,iteration: 30 loss: 16.3528771043als,iteration: 31 loss: 16.3543946799als,iteration: 32 loss: 16.3559263435als,iteration: 33 loss: 16.3574718051als,iteration: 34 loss: 16.359030781als,iteration: 35 loss: 16.3606029939als,iteration: 36 loss: 16.3621881728als,iteration: 37 loss: 16.3637860527als,iteration: 38 loss: 16.3653963745als,iteration: 39 loss: 16.3670188849als,iteration: 40 loss: 16.3686533359als,iteration: 41 loss: 16.3702994852als,iteration: 42 loss: 16.3719570957als,iteration: 43 loss: 16.3736259354als,iteration: 44 loss: 16.3753057774als,iteration: 45 loss: 16.3769963994als,iteration: 46 loss: 16.3786975841als,iteration: 47 loss: 16.3804091188als,iteration: 48 loss: 16.3821307953als,iteration: 49 loss: 16.3838624095als,iteration: 50 loss: 16.385603762als,iteration: 51 loss: 16.3873546572als,iteration: 52 loss: 16.3891149038als,iteration: 53 loss: 16.3908843144als,iteration: 54 loss: 16.3926627052als,iteration: 55 loss: 16.3944498966als,iteration: 56 loss: 16.3962457123als,iteration: 57 loss: 16.3980499797als,iteration: 58 loss: 16.3998625296als,iteration: 59 loss: 16.4016831964als,iteration: 60 loss: 16.4035118176als,iteration: 61 loss: 16.405348234als,iteration: 62 loss: 16.4071922895als,iteration: 63 loss: 16.4090438312als,iteration: 64 loss: 16.4109027091als,iteration: 65 loss: 16.412768776als,iteration: 66 loss: 16.4146418879als,iteration: 67 loss: 16.4165219033als,iteration: 68 loss: 16.4184086834als,iteration: 69 loss: 16.4203020923als,iteration: 70 loss: 16.4222019965als,iteration: 71 loss: 16.424108265als,iteration: 72 loss: 16.4260207694als,iteration: 73 loss: 16.4279393836als,iteration: 74 loss: 16.429863984als,iteration: 75 loss: 16.4317944491als,iteration: 76 loss: 16.4337306598als,iteration: 77 loss: 16.4356724992als,iteration: 78 loss: 16.4376198524als,iteration: 79 loss: 16.4395726066als,iteration: 80 loss: 16.4415306513als,iteration: 81 loss: 16.4434938777als,iteration: 82 loss: 16.4454621792als,iteration: 83 loss: 16.4474354508als,iteration: 84 loss: 16.4494135896als,iteration: 85 loss: 16.4513964946als,iteration: 86 loss: 16.4533840662als,iteration: 87 loss: 16.455376207als,iteration: 88 loss: 16.457372821als,iteration: 89 loss: 16.4593738139als,iteration: 90 loss: 16.4613790932als,iteration: 91 loss: 16.4633885677als,iteration: 92 loss: 16.465402148als,iteration: 93 loss: 16.4674197461als,iteration: 94 loss: 16.4694412756als,iteration: 95 loss: 16.4714666513als,iteration: 96 loss: 16.4734957898als,iteration: 97 loss: 16.4755286087als,iteration: 98 loss: 16.4775650272als,iteration: 99 loss: 16.4796049658Process finished with exit code 0

总结

本文从稀疏矩阵分解引入,构建了平方损失函数,从而引入最小二乘法和随机梯度下降两种求解方法,并进行了求导的推导,并用python来实战观察两种方法的效果。
从中可以看出,理解一个算法,除了理解应用场景,数学原理,还要实战操作,三者相互印证可以理解的稍微深入一些。