python-recsys 2 Quickstart 2 快速开始
来源:互联网 发布:迅雷赚钱宝 网络硬盘 编辑:程序博客网 时间:2024/06/10 16:33
原文地址:http://ocelma.net/software/python-recsys/build/html/quickstart.html
一旦你已经安装好pyrecsys,你可以:
2.0 设置VERBOSE模式,看到更多的信息:
>>> import recsys.algorithm>>> recsys.algorithm.VERBOSE = True
2.1 下载数据集(首先下载Movielens 1M Ratings Data Set, ratings.dat 文件,下载地址为https://grouplens.org/datasets/movielens/):
>>> from recsys.algorithm.factorize import SVD>>> svd = SVD()>>> svd.load_data(filename='./data/movielens/ratings.dat', sep='::', format={'col':0, 'row':1, 'value':2, 'ids': int})Loading ./data/movielens/ratings.dat..........|2.2 计算SVD,M=U
>>> k = 100>>> svd.compute(k=k, min_values=10, pre_normalize=None, mean_center=True, post_normalize=True)Creating matrix (1000209 tuples)Matrix density is: 4.4684%Updating matrix: squish to at least 10 valuesComputing svd k=100, min_values=10, pre_normalize=None, mean_center=True, post_normalize=True你也可以将输出的SVD模型保存起来(以zip的文件格式保存)
>>> k = 100>>> svd.compute(k=k, min_values=10, pre_normalize=None, mean_center=True, post_normalize=True, savefile='/tmp/movielens')Creating matrix (1000209 tuples)Matrix density is: 4.4684%Updating matrix: squish to at least 10 valuesComputing svd k=100, min_values=10, pre_normalize=None, mean_center=True, post_normalize=TrueSaving svd model to /tmp/movielens
Note
For more information about svd.compute() parameters see Algorithms section.
注意:为了得到更多关于 svd.compute() 参数的信息,可以看 Algorithms 这一部分。
一旦这个SVD模型已经保存起来(zip格式)你可以随时使用它,而且不需要再一次的去计算svd.compute()。
2.3 计算两个电影之间的相似度
>>> ITEMID1 = 1 # Toy Story (1995)>>> ITEMID2 = 2355 # A bug's life (1998)>>> svd.similarity(ITEMID1, ITEMID2)0.677069366773157992.4 获得类似于Toy Story(玩具总动员)的电影:
>>> svd.similar(ITEMID1)[(1, 0.99999999999999978), # Toy Story (3114, 0.87060391051018071), # Toy Story 2 (2355, 0.67706936677315799), # A bug's life (588, 0.5807351496754426), # Aladdin (595, 0.46031829709743477), # Beauty and the Beast (1907, 0.44589398718134365), # Mulan (364, 0.42908159895574161), # The Lion King (2081, 0.42566581277820803), # The Little Mermaid (3396, 0.42474056361935913), # The Muppet Movie (2761, 0.40439361857585354)] # The Iron Giant2.5 对于一个给定的用户和电影进行预测评级
>>> MIN_RATING = 0.0>>> MAX_RATING = 5.0>>> ITEMID = 1>>> USERID = 1>>> svd.predict(ITEMID, USERID, MIN_RATING, MAX_RATING)5.0 #Predicted value>>> svd.get_matrix().value(ITEMID, USERID)5.0 #Real value2.6 向用户推荐电影
>>> svd.recommend(USERID, is_row=False) #cols are users and rows are items, thus we set is_row=False[(2905, 5.2133848204673416), # Shaggy D.A., The (318, 5.2052108435956033), # Shawshank Redemption, The (2019, 5.1037438278755474), # Seven Samurai (The Magnificent Seven) (1178, 5.0962756861447023), # Paths of Glory (1957) (904, 5.0771405690055724), # Rear Window (1954) (1250, 5.0744156653222436), # Bridge on the River Kwai, The (858, 5.0650911066862907), # Godfather, The (922, 5.0605327279819408), # Sunset Blvd. (1198, 5.0554543765500419), # Raiders of the Lost Ark (1148, 5.0548789542105332)] # Wrong Trousers, The2.7 哪些用户可以看玩具总动员
>>> svd.recommend(ITEMID)[(283, 5.716264440514446), (3604, 5.6471765418323141), (5056, 5.6218800339214496), (446, 5.5707524860615738), (3902, 5.5494529168484652), (4634, 5.51643364021289), (3324, 5.5138903299082802), (4801, 5.4947999354188548), (1131, 5.4941438045650068), (2339, 5.4916048051511659)]2.8 对于更大的数据集(超过10M 元组),直接使用SVDLIBC会更好。(divisi2也使用SVDLIBC,但是在创建矩阵和计算SVD时太慢了)
>>> from recsys.utils.svdlibc import SVDLIBC>>> svdlibc = SVDLIBC('./data/movielens/ratings.dat')>>> svdlibc.to_sparse_matrix(sep='::', format={'col':0, 'row':1, 'value':2, 'ids': int})>>> svdlibc.compute(k=100)>>> svd = svdlibc.export()>>> svd.similar(ITEMID1) # results might be different than example 4. as there's no min_values=10 set here[(1, 0.99999999999999978), (3114, 0.84099896392054219), (588, 0.79191433686817747), (2355, 0.7772760704844065), (1265, 0.74946256379033827), (364, 0.73730970556786068), (2321, 0.73652131961235268), (595, 0.71665833726881523), (3253, 0.7075696829413568), (1923, 0.69687698887991523)]
0 0
- python-recsys 2 Quickstart 2 快速开始
- 1.2.0 Quickstart: "Hello World"(快速开始:“你好世界!”)
- python-recsys Library中文文档
- Angular 2快速开始
- WordPress 2 (Visual QuickStart Guide)
- Kafka-[2]-Documentation-单机QuickStart
- python-recsys 1 Installation 1 安装
- python-recsys 3 Data model 3 数据模型
- python-recsys 4 Algorithms 4 算法
- Vue.js 2 Quickstart Tutorial 2017(Vue.js 2快速入门教程2017)
- EasyJWeb快速入门(QuickStart)
- Spark-python-快速开始
- zend framework quickstart zend框架入门之创建工程(快速开始:一)
- zend framework quickstart zend框架入门之创建布局(快速开始:二)
- zend framework quickstart zend框架入门之创建模型和数据库表(快速开始:三)
- kafka文档(7)----0.10.1-QuickStart-快速开始
- Django-REST之quickstart快速开始-翻译及自己的学习笔记(更新完善中...)
- Python (Visual QuickStart Guide)
- 斐波那契数列
- 四大非关系型数据库类型
- Splay-总结
- 最简单易用的Qt 界面库插件 FTStyle (三)Qt4与Qt5如何选择
- 代码填空:组合数字
- python-recsys 2 Quickstart 2 快速开始
- POJ 1064 Cable master
- MFC 将接收的数据转换为float型输出
- 指针数组,数组指针
- Qt之安装MySQL驱动
- poj3624-Charm Bracelet(01背包模板题)
- 基础知识学习笔记(一)
- 顺序表应用1:多余元素删除之移位算法
- 补全等式