mahout 为约会数据集推荐

来源:互联网 发布:360浏览器 网络收藏夹 编辑:程序博客网 时间:2024/06/06 03:57

在http://libimseti.cz中下载约会的数据集

里面的ratings.dat有257MB,以逗号分隔,包含用户ID,档案ID和评分(档案ID和用户ID不是采用同一个匿名方法)

这个数据集经过了预处理,剔除了生成评分个数不到20个的用户,也剔除了对每个档案都给出相同分值的用户

根据《mahout 实战》中所说,最优的配置是基于用户的推荐,采用欧氏距离,近邻数量为2.

由此写出的评估程序如下:

    public static void evaluateDateData() throws IOException, TasteException
    {
        DataModel model = new FileDataModel(new File("F:\\mahout\\libimseti\\libimseti-complete\\libimseti\\ratings.dat"));
        RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
        RecommenderBuilder builder = new RecommenderBuilder(){

            @Override
            public Recommender buildRecommender(DataModel model)
                    throws TasteException {
                // TODO Auto-generated method stub
                UserSimilarity similarity = new EuclideanDistanceSimilarity(model);
                UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
                return new GenericUserBasedRecommender(model, neighborhood, similarity);
            }
            
        };
        double score = evaluator.evaluate(builder, null, model, 0.95, 0.05);
        System.out.println(score);                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

   }

运行结果为0.8415841584158418

在这里的参数设为-Xmx1024m

0 0