Mahout基于item的协同过滤之asMatrix
来源:互联网 发布:密室逃脱窃取数据攻略 编辑:程序博客网 时间:2024/06/08 11:12
/** * Job asMatrix * 输出:itemA, <itemO, similarity> 格式的向量 * 完成的功能: * 1.对每个item求topN相似的wupin * 2.计算下三角矩阵(由已经计算完成的上三角矩阵完成) */if (shouldRunNextPhase(parsedArgs, currentPhase)) {Job asMatrix = prepareJob(pairwiseSimilarityPath, // 输入文件getOutputPath(),// 输出文件UnsymmetrifyMapper.class,// IntWritable.class,// mapper output keyVectorWritable.class, // mapper output valueMergeToTopKSimilaritiesReducer.class,// IntWritable.class,// reducer output keyVectorWritable.class);// reducer output valueasMatrix.setCombinerClass(MergeToTopKSimilaritiesReducer.class);asMatrix.getConfiguration().setInt(MAX_SIMILARITIES_PER_ROW, maxSimilaritiesPerRow);boolean succeeded = asMatrix.waitForCompletion(true);if (!succeeded) {return -1;}}
(1)UnsymmetrifyMapper
public static class UnsymmetrifyMapper extends Mapper<IntWritable, VectorWritable, IntWritable, VectorWritable> {private int maxSimilaritiesPerRow; // item相似个数@Overrideprotected void setup(Context ctx) throws IOException, InterruptedException {maxSimilaritiesPerRow = ctx.getConfiguration().getInt( MAX_SIMILARITIES_PER_ROW, 0);Preconditions.checkArgument(maxSimilaritiesPerRow > 0, "Maximum number of similarities per row must be greater then 0!");}@Overrideprotected void map(IntWritable row, VectorWritable similaritiesWritable, Context ctx) throws IOException, InterruptedException {Vector similarities = similaritiesWritable.get();// 相似度输入格式:itemX, <itemY, similarity>Vector transposedPartial = similarities.like(); // 转置后的向量TopElementsQueue topKQueue = new TopElementsQueue(maxSimilaritiesPerRow); // 每个item最多maxSimilaritiesPerRow最相似的itemfor (Element nonZeroElement : similarities.nonZeroes()) { // 计算topKMutableElement top = topKQueue.top();double candidateValue = nonZeroElement.get();if (candidateValue > top.get()) {top.setIndex(nonZeroElement.index());top.set(candidateValue);topKQueue.updateTop();}// 求转置向量transposedPartial.setQuick(row.get(), candidateValue);// 转置向量 <itemX, similarity> ctx.write(new IntWritable(nonZeroElement.index()), new VectorWritable(transposedPartial));// 以 itemY, <itemX, similarity> 格式写入到输出 transposedPartial.setQuick(row.get(), 0.0);}// 将与当前物品TopN相似的物品以 itemX, <itemY, similarity> 格式写入到输出 Vector topKSimilarities = new RandomAccessSparseVector(similarities.size(), maxSimilaritiesPerRow);for (Vector.Element topKSimilarity : topKQueue.getTopElements()) {topKSimilarities.setQuick(topKSimilarity.index(), topKSimilarity.get());}ctx.write(row, new VectorWritable(topKSimilarities)); // 将itemX最相似的topK写到输出文件中}}
public static class MergeToTopKSimilaritiesReducer extends Reducer<IntWritable, VectorWritable, IntWritable, VectorWritable> {private int maxSimilaritiesPerRow;// item相似个数@Overrideprotected void setup(Context ctx) throws IOException, InterruptedException {maxSimilaritiesPerRow = ctx.getConfiguration().getInt(MAX_SIMILARITIES_PER_ROW, 0);Preconditions.checkArgument(maxSimilaritiesPerRow > 0,"Maximum number of similarities per row must be greater then 0!");}@Overrideprotected void reduce(IntWritable row, Iterable<VectorWritable> partials, Context ctx) throws IOException, InterruptedException {// 将mapper过程中得到的两种向量 itemO, <itemA, similarity> 与 itemA, <itemO, similarity> 按相同物品进行合并Vector allSimilarities = Vectors.merge(partials);// 再次求TopNVector topKSimilarities = Vectors.topKElements(maxSimilaritiesPerRow, allSimilarities);// 最后输出 itemA, <itemO, similarity> 格式的向量ctx.write(row, new VectorWritable(topKSimilarities));}}
0 0
- Mahout基于item的协同过滤之asMatrix
- Mahout中基于Item的协同过滤之pairwiseSimilarity
- mahout基于用户的协同过滤-userCF
- mahout基于物品的协同过滤指令
- mahout基于项目的协同过滤步骤
- Mahout并行基于物品的协同过滤算法源码分析(Distributed item-based CF)
- Mahout系列之推荐算法-基于用户协同过滤
- Mahout系列之推荐算法-基于物品协同过滤实践
- 推荐引擎之Mahout 基于用户协同过滤算法的使用
- 基于 Apache Mahout 实现高效的协同过滤推荐
- Apache Mahout基于商品的协同过滤算法流程分析
- 【Machine Learning】Mahout基于协同过滤(CF)的用户推荐
- 深入理解mahout基于hadoop的协同过滤流程
- Mahout分布式程序开发 基于物品的协同过滤ItemCF
- mahout基于项目的协同过滤源码分析
- Mahout实现基于用户的协同过滤算法
- **基于 Apache Mahout 实现高效的协同过滤推荐电影**
- Mahout分步式程序开发 基于物品的协同过滤ItemCF【一起学Mahout】
- python Django 1.7 中文入门 (官网) 09 创建admin帐号
- springmvc+jpa+hibernate 报错
- PXI Express外设板信号汇总(更新中)
- 三元表达式和if-else语句的区别
- python Django 1.7 中文入门 (官网) 10 修改admin页面
- Mahout基于item的协同过滤之asMatrix
- C++基础知识易错点和易混淆总结汇总(不定时更新)
- asdfasdfasdfasfd
- html调整img的水平垂直位置
- Eclipse查看Java源码的方式
- javax.tools.Diagnostic 类使用示例
- C++ & VB.NET & VB 过程函数结构
- LeetCode:Majority Element
- C#中 Thread类的使用