1. 奇异值分解 SVD(singular value decomposition)

1.1 基本概念:

(1)定义:提取信息的方法:奇异值分解Singular Value Decomposition(SVD)
(2)优点:简化数据, 去除噪声,提高算法的结果

1.2 SVD应用

(1) 隐性语义索引(latent semantic indexing, LSI)/隐性语义分析(latent semantic analysis, LSA):
在LSI中, 一个矩阵由文档和词语组成的.在该矩阵上应用SVD可以构建多个奇异值, 这些奇异值代表文档中的概念或主题, 可以用于更高效的文档搜索.
(2) 推荐系统
先利用SVD从数据中构建一个主题空间, 然后在该主题空间下计算相似度.

1.3 SVD分解

  • SVD奇异值分解的定义:



  • SVD奇异值分解与特征值分解的关系:


  • SVD是一种矩阵分解技术,其将原始的数据集矩阵Amn分解为三个矩阵, ,A=UVT分解得到的三个矩阵的维度分别为mm,mn,nn.其中除了对角元素不为0,其它元素均为0,其对角元素称为奇异值,且按从大到小的顺序排列, 这些奇异值对应原始数据集矩阵A的奇异值。即AAT的特征值的平方根。在某个奇异值(r个)之后, 其它的奇异值由于值太小,被忽略置为0, 这就意味着数据集中仅有r个重要特征,而其余特征都是噪声或冗余特征.如下图所示:


确定要保留的奇异值数目有很多启发式的策略,其中一个典型的做法就是保留矩阵中90%的能量信息.为了计算能量信息,将所有的奇异值求其平方和,从大到小叠加奇异值,直到奇异值之和达到总值的90%为止;另一种方法是,当矩阵有上万个奇异值时, 直接保留前2000或3000个.,但是后一种方法不能保证前3000个奇异值能够包含钱90%的能量信息,但是操作简单。


2. 基于协同过滤的推荐引擎



  • 欧式距离

  • 皮尔逊相关系数
    第二种计算距离的方法是皮尔逊相关系数(Pearson correlation),它度量的是两个向量之间的相似度。该方法相对于欧氏距离的一个

  • 余弦相似度
    另一个常用的距离计算方法就是余弦相似度(cosine similarity),其计算的是两个向量夹角的余弦值。如果夹角为90度,则相似度为0;如果两个向量的方向相同,则相似度为1.0。同皮尔逊相关系数一样,余弦相似度的取值范围也在-1到+1之间,因此我们也将它归一化到0到1之间。计算余弦相似度值,我们采用的两个向量A和B夹角的余弦相似度的定义如下:





3. 餐馆菜肴推荐引擎



# -*- coding: utf-8 -*-"""Spyder Editor"""from numpy import *from numpy import linalg as ladef loadExData():    return[[0, 0, 0, 2, 2],           [0, 0, 0, 3, 3],           [0, 0, 0, 1, 1],           [1, 1, 1, 0, 0],           [2, 2, 2, 0, 0],           [5, 5, 5, 0, 0],           [1, 1, 1, 0, 0]]def ecludSim(inA,inB):  # 计算两向量的相似度    return 1.0/(1.0 + la.norm(inA - inB)) # 这里距离采用的是欧式距离def pearsSim(inA,inB):    if len(inA) < 3 :         return 1.0    return 0.5+0.5*corrcoef(inA, inB, rowvar = 0)[0][1]def cosSim(inA,inB): # 余弦相似度    num = float(inA.T*inB)    denom = la.norm(inA)*la.norm(inB)    return 0.5+0.5*(num/denom)# 用户对物品的估计评分值def standEst(dataMat, user, simMeas, item): # 参数:数据矩阵,用户编号,物品编号,相似度计算方法    n = shape(dataMat)[1] # 数据的物品数(列对应物品,行对应用户)    simTotal = 0.0; ratSimTotal = 0.0    for j in range(n): # 遍历每个物品        userRating = dataMat[user,j] # 该用户对所有物品的评分        if userRating == 0:             continue # 跳过这个物品        # 得到第2列和其他列物品的同时被评分的元素        overLap = nonzero(logical_and(dataMat[:,item].A>0, dataMat[:,j].A>0))[0]        print 'overLap :',overLap         if len(overLap) == 0: # 如果两者没有任何重复的元素            print 'here...'            similarity = 0 # 则相似度为0        else:  # 基于重合物品计算相似度            similarity = simMeas(dataMat[overLap,item], dataMat[overLap,j])        print 'the %d and %d similarity is: %f' % (item, j, similarity)        simTotal += similarity # 相似度不断累加        ratSimTotal += similarity * userRating # 累加相似度与用户评分乘积    if simTotal == 0:         return 0    else:         return ratSimTotal/simTotal # 除以相似度的总和得到对相似度评分的归一化,使得最后的评分在0到5之间# 推荐引擎(用的是余弦相似度)def recommend(dataMat, user, N=3, simMeas=cosSim, estMethod=standEst):    unratedItems = nonzero(dataMat[user,:].A==0)[1]   # 得到dataMat中用户2对没有评价过物品的索引    if len(unratedItems) == 0:  # 如果都评价了则返回显示已评价过所有的物品        return 'you rated everything'    itemScores = []    print 'unratedItems:',unratedItems  # 索引值    for item in unratedItems:  # 遍历没有评价过的物品        estimatedScore = estMethod(dataMat, user, simMeas, item) # 得到预测评分        itemScores.append((item, estimatedScore)) # 物品编号和估计值分值放在一个元素列表中    return sorted(itemScores, key=lambda jj: jj[1], reverse=True)[:N] # 从大到小的逆序排列# 主函数myData=mat(loadExData())sim_dis_1=ecludSim(myData[:,0],myData[:,4]) # 欧式距离计算相似度print 'sim_dis_1:',sim_dis_1 sim_dis_2=cosSim(myData[:,0],myData[:,4]) # 余弦相似度print 'sim_dis_2:',sim_dis_2sim_dis_3=pearsSim(myData[:,0],myData[:,4]) # 余弦相似度print 'sim_dis_3:',sim_dis_3# 推荐引擎myData[0,1]=myData[0,0]=myData[1,0]=myData[2,0]=4myData[3,3]=2re_1=recommend(myData,2)re_2=recommend(myData,2,simMeas=ecludSim)re_3=recommend(myData,2,simMeas=pearsSim)print 're_1:',re_1print 're_2:',re_2print 're_3:',re_3


sim_dis_1: 0.129731907557sim_dis_2: 0.5sim_dis_3: 0.205965381738unratedItems: [1 2]overLap : [0 3 4 5 6]the 1 and 0 similarity is: 1.000000overLap : [0 3]the 1 and 3 similarity is: 0.928746overLap : [0]the 1 and 4 similarity is: 1.000000overLap : [3 4 5 6]the 2 and 0 similarity is: 1.000000overLap : [3]the 2 and 3 similarity is: 1.000000overLap : []here...the 2 and 4 similarity is: 0.000000unratedItems: [1 2]overLap : [0 3 4 5 6]the 1 and 0 similarity is: 1.000000overLap : [0 3]the 1 and 3 similarity is: 0.309017overLap : [0]the 1 and 4 similarity is: 0.333333overLap : [3 4 5 6]the 2 and 0 similarity is: 1.000000overLap : [3]the 2 and 3 similarity is: 0.500000overLap : []here...the 2 and 4 similarity is: 0.000000unratedItems: [1 2]overLap : [0 3 4 5 6]the 1 and 0 similarity is: 1.000000overLap : [0 3]the 1 and 3 similarity is: 1.000000overLap : [0]the 1 and 4 similarity is: 1.000000overLap : [3 4 5 6]the 2 and 0 similarity is: 1.000000overLap : [3]the 2 and 3 similarity is: 1.000000overLap : []here...the 2 and 4 similarity is: 0.000000re_1: [(2, 2.5), (1, 2.0243290220056256)]re_2: [(2, 3.0), (1, 2.8266504712098603)]re_3: [(2, 2.5), (1, 2.0)]


4. 利用SVD提高推荐效果

实际的数据集得到的矩阵相当稀疏,因此可以先利用SVD将原始矩阵映射到低维空间中,; 然后再在低维空间中, 计算物品间的相似度,大大减少计算量。


# -*- coding: utf-8 -*-from numpy import *from numpy import linalg as ladef loadExData2():    return[[0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 5],           [0, 0, 0, 3, 0, 4, 0, 0, 0, 0, 3],           [0, 0, 0, 0, 4, 0, 0, 1, 0, 4, 0],           [3, 3, 4, 0, 0, 0, 0, 2, 2, 0, 0],           [5, 4, 5, 0, 0, 0, 0, 5, 5, 0, 0],           [0, 0, 0, 0, 5, 0, 1, 0, 0, 5, 0],           [4, 3, 4, 0, 0, 0, 0, 5, 5, 0, 1],           [0, 0, 0, 4, 0, 4, 0, 0, 0, 0, 4],           [0, 0, 0, 2, 0, 2, 5, 0, 0, 1, 2],           [0, 0, 0, 0, 5, 0, 0, 0, 0, 4, 0],           [1, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0]]def cosSim(inA,inB): # 余弦相似度    num = float(inA.T*inB)    denom = la.norm(inA)*la.norm(inB)    return 0.5+0.5*(num/denom)def ecludSim(inA,inB):  # 计算两向量的相似度    return 1.0/(1.0 + la.norm(inA - inB)) # 这里距离采用的是欧式距离def pearsSim(inA,inB):    if len(inA) < 3 :         return 1.0    return 0.5+0.5*corrcoef(inA, inB, rowvar = 0)[0][1]# 基于SVD的评分估计    def svdEst(dataMat, user, simMeas, item):    n = shape(dataMat)[1]    simTotal = 0.0; ratSimTotal = 0.0    U,Sigma,VT = la.svd(dataMat)  # 对数据集进行了SVD分解    Sig4 = mat(eye(4)*Sigma[:4]) # 构造对角矩阵(Sigma[:4]是以数组的形式保存,因此需要进行矩阵运算)    xformedItems = dataMat.T * U[:,:4] * Sig4.I  # 利用U矩阵将物品转换到低维空间中去    #print 'xformedItems:','\n',xformedItems    print shape(xformedItems)    for j in range(n): # 对给定用户上循环所有的元素        userRating = dataMat[user,j]        if userRating == 0 or j==item:             continue        similarity = simMeas(xformedItems[item,:].T, xformedItems[j,:].T) # 这里相似度的计算方法是在低维空间中进行的        print 'the %d and %d similarity is: %f' % (item, j, similarity)        simTotal += similarity  # 对相似度求和        ratSimTotal += similarity * userRating # 对相似度以及对应的评分值乘积求和    if simTotal == 0:         return 0    else:         return ratSimTotal/simTotal# 推荐引擎(用的是余弦相似度)def recommend(dataMat, user, N=3, simMeas=cosSim, estMethod=svdEst):    unratedItems = nonzero(dataMat[user,:].A==0)[1]   # 得到dataMat中用户对没有评价过物品的索引    print 'unratedItems:',unratedItems    if len(unratedItems) == 0:  # 如果都评价了则返回显示已评价过所有的物品        return 'you rated everything'    itemScores = []    for item in unratedItems:  # 遍历没有评价过的物品        estimatedScore = estMethod(dataMat, user, simMeas, item) # 得到预测评分        itemScores.append((item, estimatedScore)) # 物品编号和估计值分值放在一个元素列表中    return sorted(itemScores, key=lambda jj: jj[1], reverse=True)[:N] # 从大到小的逆序排列,输出前三    # 主函数  U,Sigma,VT=la.svd(loadExData2())print 'Sigma:',Sigma# 计算要多少个奇异值才能到总能量的90%Sig2=Sigma**2sum_1=sum(Sig2)print 'sum_1:',sum_1sum_2=sum(Sig2)*0.9print 'sum_2',sum_2print sum(Sig2[:2])print sum(Sig2[:3])myMat=mat(loadExData2())re=recommend(myMat,1,estMethod=svdEst)print 're:',re


Sigma: [ 15.77075346  11.40670395  11.03044558 ...,   0.43800353   0.22082113   0.07367823]sum_1: 542.0sum_2 487.8378.829559511500.500289128unratedItems: [0 1 2 4 6 7 8 9](11L, 4L)the 0 and 3 similarity is: 0.490950the 0 and 5 similarity is: 0.484274the 0 and 10 similarity is: 0.512755(11L, 4L)the 1 and 3 similarity is: 0.491294the 1 and 5 similarity is: 0.481516the 1 and 10 similarity is: 0.509709(11L, 4L)the 2 and 3 similarity is: 0.491573the 2 and 5 similarity is: 0.482346the 2 and 10 similarity is: 0.510584(11L, 4L)the 4 and 3 similarity is: 0.450495the 4 and 5 similarity is: 0.506795the 4 and 10 similarity is: 0.512896(11L, 4L)the 6 and 3 similarity is: 0.743699the 6 and 5 similarity is: 0.468366the 6 and 10 similarity is: 0.439465(11L, 4L)the 7 and 3 similarity is: 0.482175the 7 and 5 similarity is: 0.494716the 7 and 10 similarity is: 0.524970(11L, 4L)the 8 and 3 similarity is: 0.491307the 8 and 5 similarity is: 0.491228the 8 and 10 similarity is: 0.520290(11L, 4L)the 9 and 3 similarity is: 0.522379the 9 and 5 similarity is: 0.496130the 9 and 10 similarity is: 0.493617re: [(4, 3.3447149384692283), (7, 3.3294020724526963), (9, 3.328100876390069)]

5. 冷启动问题


冷启动问题的解决方案,就是将推荐看成是搜索问题。在内部表现上,不同的解决办法虽然有所不同,但是对用户而言却都是透明的。为了将推荐看成是搜索问题,我们可能要使用所需要推荐物品的属性。在餐馆菜肴的例子中,我们可以通过各种标签来标记菜肴,比如素食、美式88、价格很贵等。同时,我们也可以将这些属性作为相似度计算所需要的数据,这被称为基于内容(content-based)的推荐。可能,基于内容的推荐并不如我们前面介绍的基于协同过滤的推荐效果好 ,但我们拥有它,这就是个良好的开始。

6. 基于图像的压缩



# -*- coding: utf-8 -*-# 打印矩阵def printMat(inMat, thresh=0.8):     for i in range(32):        for k in range(32):            if float(inMat[i,k]) > thresh: # 通过阈值来界定深色和浅色                print 1,            else: print 0,        print ''# 实现图像的压缩def imgCompress(numSV=3, thresh=0.8):    myl = []    for line in open('0_5.txt').readlines():        newRow = []        for i in range(32):            newRow.append(int(line[i]))        myl.append(newRow)    myMat = mat(myl)    print ' shape myMat:',shape(myMat)    print "****original matrix******"    printMat(myMat, thresh)    U,Sigma,VT = la.svd(myMat) #SVD分解    SigRecon = mat(zeros((numSV, numSV)))    for k in range(numSV): # 把奇异值填充到对角线        SigRecon[k,k] = Sigma[k]    reconMat = U[:,:numSV]*SigRecon*VT[:numSV,:] # 得到重构的矩阵    print "****reconstructed matrix using %d singular values******" % numSV    print 'reconMat',shape(reconMat)    printMat(reconMat, thresh) # 打印出来# 主函数imgCompress(2)


 shape myMat: (32L, 32L)****original matrix******0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ****reconstructed matrix using 2 singular values******reconMat (32L, 32L)0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 


6. 笔记

(1)overLap = nonzero(logical_and(dataMat[:,item].A>0, dataMat[:,j].A>0))[0]

>>> import numpy as np>>> np.logical_or(True, False)True>>> np.logical_or([True, False], [False, False])array([ True, False], dtype=bool)>>> x = np.arange(5)>>> xarray([0, 1, 2, 3, 4])>>> np.logical_or(x < 1, x > 3)array([ True, False, False, False,  True], dtype=bool)>>> x<1array([ True, False, False, False, False], dtype=bool)>>> x>3array([False, False, False, False,  True], dtype=bool)>>> np.logical_or(x < 1, x > 3)array([ True, False, False, False,  True], dtype=bool)>>> 

(2)unratedItems = nonzero(dataMat[user,:].A==0)[1]

In [16]: myData[2,:]Out[16]: matrix([[4, 0, 0, 1, 1]])In [17]: nonzero(myData[2,:].A==0)Out[17]: (array([0, 0], dtype=int64), array([1, 2], dtype=int64))In [18]: nonzero(myData[2,:].A==0)[1]Out[18]: array([1, 2], dtype=int64)

(3)verLap = nonzero(logical_and(dataMat[:,item].A>0, dataMat[:,j].A>0))[0]

In [24]: myDataOut[24]: matrix([[4, 4, 0, 2, 2],        [4, 0, 0, 3, 3],        [4, 0, 0, 1, 1],        ...,         [2, 2, 2, 0, 0],        [5, 5, 5, 0, 0],        [1, 1, 1, 0, 0]])In [25]: logical_and(myData[:,1].A>0, myData[:,0].A>0)Out[25]: array([[ True],       [False],       [False],       [ True],       [ True],       [ True],       [ True]], dtype=bool)In [26]: nonzero(logical_and(myData[:,1].A>0, myData[:,0].A>0))Out[26]: (array([0, 3, 4, 5, 6], dtype=int64), array([0, 0, 0, 0, 0], dtype=int64))In [27]: nonzero(logical_and(myData[:,1].A>0, myData[:,0].A>0))[0]Out[27]: array([0, 3, 4, 5, 6], dtype=int64)

(4)similarity = simMeas(dataMat[overLap,item], dataMat[overLap,j])

In [33]: overlap=[0,3,4,5,6]In [35]: myData[overlap,0] # 得到第一列的坐标为overlap的元素Out[35]: matrix([[4],        [1],        [2],        [5],        [1]])

(5)Sig4 = mat(eye(4)*Sigma[:4])

In [11]: Sigma[:4]Out[11]: array([ 15.77075346,  11.40670395,  11.03044558,   4.84639758])In [12]: type(Sigma)Out[12]: numpy.ndarrayIn [13]: mat(eye(4)*Sigma[:4])Out[13]: matrix([[ 15.77075346,   0.        ,   0.        ,   0.        ],        [  0.        ,  11.40670395,   0.        ,   0.        ],        [  0.        ,   0.        ,  11.03044558,   0.        ],        [  0.        ,   0.        ,   0.        ,   4.84639758]])