《用python进行数据分析》语法要点总结(2)

来源:互联网 发布:襄阳网络广播电视台 编辑:程序博客网 时间:2024/05/18 14:26

movielens小程序
(1)打开dat文件,用pd.read_table,说明分隔符是::
(2)pd.merge的默认设置
data = pd.merge(pd.merge(ratings, users),movies)#pd根据列推断出来怎么合并
没有指定连接键,默认用重叠列名,没有指定连接方式
(3)sort_by_diff[::-1][:10]#对结果取反,男性最喜欢的电影
整个程序比较简单:

import pandas as pdunames=['user_id','gender','age','occupation','zip']users=pd.read_table('users.dat',sep='::',header=None,names=unames)rnames=['user_id','movie_id','rating','timestamp']ratings=pd.read_table('ratings.dat',sep='::',header=None,names=rnames)mnames=['movie_id','title','genres']movies=pd.read_table('movies.dat',sep='::',header=None,names=mnames)data = pd.merge(pd.merge(ratings, users),movies)#pd根据列明推断出来怎么合并mean_ratings = data.pivot_table('rating',index='title',columns='gender',aggfunc='mean')rate_by_title = data.groupby('title').size()active_title = rate_by_title.index[rate_by_title>=250]top_female_ratings = mean_ratings.sort_index(by='F',ascending = False)mean_ratings['diff'] = mean_ratings['M']-mean_ratings['F']sort_by_diff = mean_ratings.sort_index(by='diff') #分歧最大且女性最喜欢的电影sort_by_diff[::-1][:10]#对结果取反,男性最喜欢的电影rating_std_by_title = data.groupby('title')['rating'].std()rating_std_by_title = rating_std_by_title.ix[active_title]print (rating_std_by_title.sort_values(ascending = False)[:10] ) #sort_values
阅读全文
0 0
原创粉丝点击