集体智慧编程movielens

来源:互联网 发布:拖欠淘宝贷款30天 编辑:程序博客网 时间:2024/05/16 11:50

def loadmovielens(path='C:/data/ml-latest-small'):
    movies={}
    for line in open(path+'/movies.csv'):
        (id,title)=line.split('|')[0:2]
        movies[id]=title
    prefs={}
    for line in open(path+'/ratings.csv'):
        (user,movied,rating,ts)=line.split('\t')
        prefs.setdefault(user,{})
        prefs[user][movies[movied]]=float(rating)
    return prefs

遇到的问题:

1.(id,title)=line.split('|')[0:2]

       ValueError: need more than 1 value to unpack

在脚本中运行:

                 movieId,title,genres

                  1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy

                   2,Jumanji (1995),Adventure|Children|Fantasy

                   3,Grumpier Old Men (1995),Comedy|Romance

                   4,Waiting to Exhale (1995),Comedy|Drama|Romance

                   ...  ...

由于在line运行结果下,数据之间以逗号隔开,遂将(id,title)=line.split('|')[0:2]改为.(id,title)=line.split(',')[0:2]

同样的(user,movied,rating,ts)=line.split('\t')改为:(user,movied,rating,ts)=line.split(',')
2. 错误: prefs[user][movies[movied]]=float(rating)
                 ValueError: could not convert string to float: 'rating'

可是经过验证,在脚本中是可以转换的。

>>> float('123')
123.0

突然间想到,可能是我的数据有问题:

userIdmovieIdratingtimestamp116412178977931241.512178958071324121789624614741217896556

我的每一列的标题都在,因而不能将string转化为float,故而掉第一行,再来运行:

>>> prefs=loadmovielens()
>>> prefs['87']

 

结果正确:{'Rain Man (1988)': 4.0, 'Million Dollar Baby (2004)': 4.0, 'Forrest Gump (1994)': 4.5, 'Total Recall (1990)': 3.5, 'Goodfellas (1990)': 4.5, 'Pollock (2000)': 3.5, 'Big Fish (2003)': 3.5, 'Jaws (1975)': 4.0, "One Flew Over the Cuckoo's Nest (1975)": 4.0, 'Bullets Over Broadway (1994)': 4.0, '"Good Year': 2.5, 'DiG! (2004)': 3.5, 'Gone with the Wind (1939)': 3.5, 'Copycat (1995)': 2.5, 'Pirates of Silicon Valley (1999)': 3.5, 'Empire of the Sun (1987)': 3.5, '"Royal Tenenbaums': 3.0, 'Tideland (2005)': 0.5, 'Garden State (2004)': 3.5, 'Grosse Pointe Blank (1997)': 3.0, '"Amelie (Fabuleux destin d\'Am茅lie Poulain': 3.0, 'Stand by Me (1986)': 3.5, 'Titanic (1997)': 5.0, '"Matrix': 4.5, 'Junebug (2005)': 4.5, 'Sin City (2005)': 3.5... ...

 


 

 

0 0
原创粉丝点击