基于Mahout的图书推荐系统
来源:互联网 发布:windows tracert 多个 编辑:程序博客网 时间:2024/05/16 13:45
一、 用Maven搭建Mahout的开发环境
(8)在eclipse里运行结果如下
二、用案例的数据集,基于Mahout,任选一种算法,对任意一个女性用户进行协同过滤推荐,并解释推荐结果是否合理,解释过程可以写成一文档说明。
1、选择基于用户的协同过滤算法:UserCF
2、算法模型:DataModel+UserSimilarity+UserNeighborhood+UserBasedRecommender
(1)导入分析数据(rating.csv为评分数据,user.csv为用户信息)
三、增加过滤条件,排除男性,只保留对女性用户的推荐评分
选用的算法模型为:FileDataModel+EuclideanDistanceSimilarity+GenericItemBasedRecommender
package com.panguoyuan.mahout.itemcf;import java.io.File;import java.io.IOException;import java.util.List;import org.apache.mahout.cf.taste.common.TasteException;import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity;import org.apache.mahout.cf.taste.model.DataModel;import org.apache.mahout.cf.taste.recommender.RecommendedItem;import org.apache.mahout.cf.taste.recommender.Recommender;import org.apache.mahout.cf.taste.similarity.UserSimilarity;public class UserCF { final static int NEIGHBORHOOD_NUM = 2; final static int RECOMMENDER_NUM = 3; public static void main(String[] args) throws IOException, TasteException { String file = "inputdata/item.csv"; DataModel model = new FileDataModel(new File(file)); UserSimilarity user = new EuclideanDistanceSimilarity(model); NearestNUserNeighborhood neighbor = new NearestNUserNeighborhood(NEIGHBORHOOD_NUM, user, model); Recommender r = new GenericUserBasedRecommender(model, neighbor, user); LongPrimitiveIterator iter = model.getUserIDs(); while (iter.hasNext()) { long uid = iter.nextLong(); List<RecommendedItem> list = r.recommend(uid, RECOMMENDER_NUM); System.out.printf("uid:%s", uid); for (RecommendedItem ritem : list) { System.out.printf("(%s,%f)", ritem.getItemID(), ritem.getValue()); } System.out.println(); } }}
二、用案例的数据集,基于Mahout,任选一种算法,对任意一个女性用户进行协同过滤推荐,并解释推荐结果是否合理,解释过程可以写成一文档说明。
1、选择基于用户的协同过滤算法:UserCF
2、算法模型:DataModel+UserSimilarity+UserNeighborhood+UserBasedRecommender
package com.panguoyuan.mahout.itemcf;import java.io.File;import java.util.List;import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;import org.apache.mahout.cf.taste.model.DataModel;import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;import org.apache.mahout.cf.taste.recommender.RecommendedItem;import org.apache.mahout.cf.taste.recommender.UserBasedRecommender;import org.apache.mahout.cf.taste.similarity.UserSimilarity;public class BasedUserBookRecommender2 { public static void main(String[] args) throws Exception { long userId = 188; //构建数据模型 DataModel model = new FileDataModel(new File("inputdata/rating.csv")); //创建相似度 UserSimilarity itemSimilarity = new PearsonCorrelationSimilarity(model); //UserSimilarity itemSimilarity = new EuclideanDistanceSimilarity(model); //GenericUserSimilarity genericItemSimilarity = new GenericUserSimilarity(itemSimilarity, model); //构建近邻算法 UserNeighborhood neighborhood = new NearestNUserNeighborhood(3, itemSimilarity, model); //构建推荐模型 UserBasedRecommender userBasedRecommender = new GenericUserBasedRecommender(model, neighborhood, itemSimilarity); //计算并返回图书推荐结果 List<RecommendedItem> recommendations = userBasedRecommender.recommend(188, 5); //打印推荐结果 showItems(userId, recommendations, true); } public static void showItems(long uid, List<RecommendedItem> recommendations, boolean skip) { if (skip || recommendations.size() > 0) { System.out.printf("userId:%s,", uid); for (RecommendedItem r : recommendations) { System.out.printf("(%s,%f)", r.getItemID(), r.getValue()); } System.out.println(); } }}
4、输出结果
userId:188,(885,9.500000)(396,7.000000)(688,6.000000)5、用R语言对推荐结果进行人工分析
(1)导入分析数据(rating.csv为评分数据,user.csv为用户信息)
ratings=read.csv("F:\workspace1\mahout\inputdata\rating.csv",FALSE)users=read.csv("F:\workspace1\mahout\inputdata\user.csv",FALSE)
(2)修改列名
ratings=data.frame('userid'=ratings$V1,'bookid'=ratings$V2,'grade'=ratings$V3)users=data.frame('userid'=users$V1,'sex'=users$V2,'age'=users$V3)(3)查看用户188都看了哪些书
> ratings[c(ratings$userid==188),]userid bookid grade3760 188 798 63761 188 653 33762 188 426 63763 188 742 73764 188 549 23765 188 520 83766 188 312 23767 188 213 103768 188 954 53769 188 121 103770 188 204 93771 188 684 33772 188 493 43773 188 452 13774 188 622 33775 188 298 8
(4)图书885推荐分数最高,下面查看该图书有哪些人评过分
ratings[c(ratings$bookid==885),]userid bookid grade182 9 885 81225 60 885 103691 184 885 9
(5)查看这用户9,用户60,用户184,用户188的信息
> users[c(9,60,184,188),] userid sex age9 9 M 5060 60 F 49184 184 M 27188 188 F 24
(6)查看这用户9,用户60,用户184与用户188都共同看了哪些图书
> rating188=ratings[which(ratings$userid==188),]> rating9=ratings[which(ratings$userid==9),]> rating60=ratings[which(ratings$userid==60),]> rating184=ratings[which(ratings$userid==184),]> intersect(rating188$bookid,rating9$bookid)integer(0)> intersect(rating188$bookid,rating60$bookid)[1] 312 298> intersect(rating188$bookid,rating184$bookid)[1] 121 684从上面可以看出用户188与用户60共同看了312和298这两本书,与用户184共同看了121和684这两本书,他们都有共同的偏好,所以给用户188推荐图书885是合理的。
三、增加过滤条件,排除男性,只保留对女性用户的推荐评分
选用的算法模型为:FileDataModel+EuclideanDistanceSimilarity+GenericItemBasedRecommender
package com.panguoyuan.mahout.itemcf;import java.io.BufferedReader;import java.io.File;import java.io.FileReader;import java.io.IOException;import java.util.HashSet;import java.util.List;import java.util.Set;import org.apache.mahout.cf.taste.common.TasteException;import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender;import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity;import org.apache.mahout.cf.taste.impl.similarity.GenericItemSimilarity;import org.apache.mahout.cf.taste.model.DataModel;import org.apache.mahout.cf.taste.recommender.IDRescorer;import org.apache.mahout.cf.taste.recommender.ItemBasedRecommender;import org.apache.mahout.cf.taste.recommender.RecommendedItem;import org.apache.mahout.cf.taste.similarity.ItemSimilarity;public class BookFilterGenderRecommender3 { public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("inputdata/rating.csv")); ItemSimilarity otherSimilarity = new EuclideanDistanceSimilarity(model); GenericItemSimilarity similarity = new GenericItemSimilarity(otherSimilarity, model); ItemBasedRecommender recommender = new GenericItemBasedRecommender(model, similarity); filterRecommender(188, recommender, model); } public static void showItems(long uid,List<RecommendedItem> recommendations, boolean skip) { if (skip || recommendations.size() > 0) { System.out.printf("userId:%s,", uid); for (RecommendedItem r : recommendations) { System.out.printf("Item:(%s,%f)", r.getItemID(), r.getValue()); System.out.println(); } } } /** * 对用户性别进行过滤 */ public static void filterRecommender(long uid, ItemBasedRecommender recommender, DataModel dataModel) throws TasteException, IOException { Set<Long> userids = getMale("datafile/book/user.csv"); //计算男性用户打分过的图书 Set<Long> bookids = new HashSet<Long>(); for (long uids : userids) { LongPrimitiveIterator iter = dataModel.getItemIDsFromUser(uids).iterator(); while (iter.hasNext()) { long bookid = iter.next(); bookids.add(bookid); } } IDRescorer rescorer = new FilterRescorer(bookids); List<RecommendedItem> list = recommender.recommend(uid, 10, rescorer); showItems(uid, list, false); } /** * 返回所有男性id */ public static Set<Long> getMale(String file) throws IOException { BufferedReader br = new BufferedReader(new FileReader(new File(file))); Set<Long> userids = new HashSet<Long>(); String s = null; while ((s = br.readLine()) != null) { String[] cols = s.split(","); if (cols[1].equals("M")) { userids.add(Long.parseLong(cols[0])); } } br.close(); return userids; }}/*** 对结果重计算*/class FilterRescorer implements IDRescorer { final private Set<Long> userids; public FilterRescorer(Set<Long> userids) { this.userids = userids; } @Override public double rescore(long id, double originalScore) { return isFiltered(id) ? Double.NaN : originalScore; } @Override public boolean isFiltered(long id) { return userids.contains(id); }}
3、打印推荐结果
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".SLF4J: Defaulting to no-operation (NOP) logger implementationSLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.userId:188,Item:(365,8.800000)Item:(725,8.583333)Item:(427,8.000000)Item:(403,7.987013)Item:(734,7.676371)Item:(256,7.533333)Item:(300,7.428571)Item:(743,7.333333)Item:(356,6.875000)Item:(579,6.777778)4、人工对数据进行分析
(1)查看图书365都有哪些用户评过分
> ratings[c(ratings$bookid==365),] userid bookid grade1046 51 365 92206 111 365 92632 134 365 4> users[c(51,111,134),] userid sex age51 51 F 18111 111 F 40134 134 F 74(2)利用intersect函数把用户188与25,45,65这三个用户共同评分过的图书汇集出来
说明:intersect(A,B)是一个数据框都在A和B这些行
>rating188=ratings[which(ratings$userid==188),]>rating51=ratings[which(ratings$userid==51),]>rating111=ratings[which(ratings$userid==111),]>rating134=ratings[which(ratings$userid==134),]> intersect(rating188$bookid,rating51$bookid)integer(0)> intersect(rating188$bookid,rating134$bookid)[1] 204> intersect(rating188$bookid,rating111$bookid)[1] 742(3)从上面可以看出用户188与用户134共同看了204图书,与111共同看了742图书
> rating188 userid bookid grade3760 188 798 63761 188 653 33762 188 426 63763 188 742 73764 188 549 23765 188 520 83766 188 312 23767 188 213 103768 188 954 53769 188 121 103770 188 204 93771 188 684 33772 188 493 43773 188 452 13774 188 622 33775 188 298 8综上所述把图书365推荐给用户188是合理的。
1 0
- 基于Mahout的图书推荐系统
- Mahout构建图书推荐系统
- Mahout构建图书推荐系统
- Mahout构建图书推荐系统
- 基于Mahout的电影推荐系统
- 基于Mahout的电影推荐系统
- 基于Mahout的电影推荐系统
- 基于Mahout的电影推荐系统
- 基于Mahout的电影推荐系统
- 基于Mahout的电影推荐系统
- 基于Mahout的电影推荐系统
- 基于Mahout的电影推荐系统
- 基于Mahout的电影推荐系统
- 基于Mahout的电影推荐系统
- 基于Mahout的电影推荐系统
- 基于Mahout的电影推荐系统
- 基于Mahout的电影推荐系统
- 基于Mahout的电影推荐系统
- ZOJ 3366 - Light Bulb(三分)
- Hibernate事务与并发问题处理(乐观锁与悲观锁)
- 工程人员注意事项
- iOS的网路请求
- 关于code first 创建并更改已经创建好的数据库的方法【本人真实操作】
- 基于Mahout的图书推荐系统
- tableView 分组间距
- 开发基本守则
- 基于Docker的PHP开发环境
- C++ 类的 复制构造函数 和 赋值构造函数
- WPF无边框窗体-可移动可调整大小
- Android Studio中如何跳到某一个方法
- linux下操作cpu寄存器
- codeforces 496C Removing Columns (模拟,思维)