利用SVD矩阵分解进行k次交叉实验和Top—N推荐

来源:互联网 发布:阿里云如何建网站 编辑:程序博客网 时间:2024/06/05 15:40

如果上一节没看的,请先看上一节Surprise项目的使用。本文利用开源GitHub项目Surprise

 

上一节说到具体的安装和一些方法的属性,本节将以SVD为例具体的代码demo的实现。

 

先说下如何利用Surprise加载本地数据集进行k次交叉实验,如果看下API,其实非常简单,体现了Surprise的强大,下面为代码:

 

# -*- coding: utf-8 -*-"""Created on Mon Aug  7 13:09:08 2017@author: Jipon"""from surprise import SVDfrom surprise import Datasetfrom surprise import evaluate, print_perffrom surprise import dataset#加载本地数据集进行3次交叉实验#每行数据类型为user item rating,依据空格来分割reader=dataset.Reader(line_format='user item rating', sep=' ')data =Dataset.load_from_file('C:\\Users\\Jipon\\Desktop\\surprise\\train.txt',reader)#定义3次交叉实验,如果不定义这句默认为5次data.split(n_folds=3)# We'll use the famous SVD algorithm.algo = SVD()# Evaluate performances of our algorithm on the dataset.perf = evaluate(algo, data, measures=['RMSE', 'MAE'])print_perf(perf)

上面代码展示了利用SVD加载本地数据集进行推荐(数据集和代码链接在本文末尾),评估方法为RMSE和MAE,官方文档评价指标没有准确度和召回率,如果我们需要这两个评价指标可以自己定义,具体请参考官网。


在做推荐系统的过程中我们经常使用TopN方法进行推荐,具体代码如下:


# -*- coding: utf-8 -*-"""Created on Tue Aug  8 13:27:08 2017@author: Jipon"""from collections import defaultdictfrom surprise import SVDfrom surprise import Datasetfrom surprise import datasetdef get_top_n(predictions, n=10):    '''Return the top-N recommendation for each user from a set of predictions.    Args:        predictions(list of Prediction objects): The list of predictions, as            returned by the test method of an algorithm.        n(int): The number of recommendation to output for each user. Default            is 10.    Returns:    A dict where keys are user (raw) ids and values are lists of tuples:        [(raw item id, rating estimation), ...] of size n.    '''    # First map the predictions to each user.,这句默认的list类型    top_n = defaultdict(list)        #uid为用户id,iid为项目id,true_r为真实的概率,est为分解后的估值    for uid, iid, true_r, est, _ in predictions:        top_n[uid].append((iid, est))    # Then sort the predictions for each user and retrieve the k highest ones.    for uid, user_ratings in top_n.items():        user_ratings.sort(key=lambda x: x[1], reverse=True)        top_n[uid] = user_ratings[:n]    return top_n# 加载数据集reader=dataset.Reader(line_format='user item rating', sep=' ')data =Dataset.load_from_file('C:\\Users\\Jipon\\Desktop\\surprise\\train.txt',reader)trainset = data.build_full_trainset()algo = SVD()algo.train(trainset)#推荐不在训练数据集里得Top—N个数据# Than predict ratings for all pairs (u, i) that are NOT in the training set.testset = trainset.build_anti_testset()predictions = algo.test(testset)top_n = get_top_n(predictions, n=2)# Print the recommended items for each userfor uid, user_ratings in top_n.items():print(uid, [iid for (iid, _) in user_ratings])

实验结果为:


当然,然后你就可以用推荐的Top-N数据进行准确度和召回率的计算了。
了。是不是非常简单?


上述代码和数据集链接:

https://github.com/Jipon/SVDTest


原创粉丝点击