集体智慧编程chapter 2提供推荐<寻找相似性的用户>

原理 :这是一种比较简单的计算相似度的方法,它们经过人们一对待评价的物品为坐标轴,然后将参与评价的人绘制到图上,并考查他们彼此间的距离远近。
sum = 相同item的差值的平方之和
total = 1 / (1 + sum)



critics = {'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5, 'The Night Listener': 3.0},'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 3.5},'Michael Phillips': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,'The Night Listener': 4.5, 'Superman Returns': 4.0, 'you, Me and Dupree': 2.5},'Mick LaSalle': {'Lady in the water': 3.0, 'Snakes on a Plane': 4.0, 'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 2.0},'Jack Matthews': {'Lady in the water': 3.0, 'Snakes on a Plane': 4.0, 'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},'Toby': {'Snakes on a Plane': 4.5, 'You, Me and Dupree': 1.0, 'Superman Returns': 4.0}}from math import sqrt#return the pearson correlation coefficient for p1 and p2def sim_pearson(prefs, p1, p2):    #得到双方都曾评价过的物品列表    si = {}    for item in prefs[p1]:        if item in prefs[p2]: si[item] = 1      #得到列表元素的个数    n = len(si)  #如果两者没有共同之处,返回0    if n == 0: return 0     #对所有的偏好求和    sum1 = sum([prefs[p1][it] for it in si])        sum2 = sum([prefs[p2][it] for it in si])     #对所有的偏好求平方和    sum1Sq = sum([pow(prefs[p1][it], 2) for it in si])    sum2Sq = sum([pow(prefs[p2][it], 2) for it in si])     #求乘积和    psum = sum([prefs[p1][it] * prefs[p2][it] for it in si])     #计算皮尔逊评价值    num = psum -(sum2 * sum1 / n)    den = sqrt((sum1Sq - pow(sum1, 2) / n) * (sum2Sq - pow(sum2, 2)/ n))        if den == 0: return 0        r = num / den        return r     #从反眏偏好的字典中返回最为匹配者 #返回结果的个数和相似度函数均为可选参数 def topMatches(prefs, person, n = 5, similarity = sim_pearson):    scores = [(similarity(prefs, person, other),other)    for other in prefs if other!= person]#对列表进行排序,评价值最高者排在最前面 (先从小到大排序,再反转,这样就可以把大的放置在前面)    scores.sort()    scores.reverse()    return scores[0:n]

Definition:sum(sequence[, start])
sum(sequence[, start]) ->value
return the sum of a sequence of numbers(not strings) plus the value of parameter 'start'(which defaults to 0).when the sequence is empty, return start

sum1Sq = sum([pow(prefs[p1][it], 2) for it in si]) 
[pow(prefs[p1][it], 2) for it in si],产生一个sequence,对si这个List中的元素所对应的健值进行平方,组成一个新的 sequence,然后sum函数对这个sequence求和

0 0