Mahout分布式程序开发 基于物品的协同过滤ItemCF
来源:互联网 发布:java如何转换日期格式 编辑:程序博客网 时间:2024/04/30 12:22
前言
Mahout是Hadoop家族中一员,从血缘就继承了Hadoop程序的特点,支持HDFS访问和MapReduce分布式计算法。随着Mahout的发展,从0.7版本开始,Mahout做了重大的升级。移除了部分算法的单机内存计算,只支持基于Hadoop的MapReduce平行计算。
目录
- mahout开发环境介绍
- Mahout基于Hadoop的分布式计算环境介绍
- 用Mahout实现协同过滤ItemCF
- 模块项目上传github
1.Mahout开发环境介绍
在 用Maven构建Mahout项目 文章中,我们已经配置好了基于Maven的Mahout的开发环境,我们将继续完成Mahout的分步式的程序开发。
本文的mahout版本为0.8。
开发环境:
Win7 64bit
Java 1.6.0_45
Maven 3
Eclipse Juno Service Release 2
Mahout 0.8
Hadoop 1.1.2
找到pom.xml,修改mahout版本为0.8
<mahout.version>0.8</mahout.version>
然后,下载依赖库。
~ mvn clean install
由于 org.conan.mymahout.cluster06.Kmeans.java 类代码,是基于mahout-0.6的,所以会报错。我们可以先注释这个文件。
2.Mahout基于Hadoop的分布式环境介绍
如上图所示,我们可以选择在win7中开发,也可以在linux中开发,开发过程我们可以在本地环境进行调试,标配的工具都是Maven和Eclipse。
Mahout在运行过程中,会把MapReduce的算法程序包,自动发布的Hadoop的集群环境中,这种开发和运行模式,就和真正的生产环境差不多了。
3.用Mahout实现协同过滤ItemCF
实现步骤:
- 准备数据文件: item.csv
- Java程序:HdfsDAO.java
- Java程序:ItemCFHadoop.java
- 运行程序
- 推荐结果解读
1). 准备数据文件: item.csv
上传测试数据到HDFS,单机内存实验请参考文章:用Maven构建Mahout项目
~ hadoop fs -mkdir /user/hdfs/userCF~ hadoop fs -copyFromLocal /home/conan/datafiles/item.csv /user/hdfs/userCF~ hadoop fs -cat /user/hdfs/userCF/item.csv1,101,5.01,102,3.01,103,2.52,101,2.02,102,2.52,103,5.02,104,2.03,101,2.53,104,4.03,105,4.53,107,5.04,101,5.04,103,3.04,104,4.54,106,4.05,101,4.05,102,3.05,103,2.05,104,4.05,105,3.55,106,4.0
2). Java程序:HdfsDAO.java
HdfsDAO.java,是一个HDFS操作的工具,用API实现Hadoop的各种HDFS命令,请参考文章:Hadoop编程调用HDFS
我们这里会用到HdfsDAO.java类中的一些方法:
HdfsDAO hdfs = new HdfsDAO(HDFS, conf); hdfs.rmr(inPath); hdfs.mkdirs(inPath); hdfs.copyFile(localFile, inPath); hdfs.ls(inPath); hdfs.cat(inFile);
3). Java程序:ItemCFHadoop.java
用Mahout实现分步式算法,我们看到Mahout in Action中的解释。
实现程序:
package org.conan.mymahout.recommendation;import org.apache.hadoop.mapred.JobConf;import org.apache.mahout.cf.taste.hadoop.item.RecommenderJob;import org.conan.mymahout.hdfs.HdfsDAO;public class ItemCFHadoop { private static final String HDFS = "hdfs://192.168.1.210:9000"; public static void main(String[] args) throws Exception { String localFile = "datafile/item.csv"; String inPath = HDFS + "/user/hdfs/userCF"; String inFile = inPath + "/item.csv"; String outPath = HDFS + "/user/hdfs/userCF/result/"; String outFile = outPath + "/part-r-00000"; String tmpPath = HDFS + "/tmp/" + System.currentTimeMillis(); JobConf conf = config(); HdfsDAO hdfs = new HdfsDAO(HDFS, conf); hdfs.rmr(inPath); hdfs.mkdirs(inPath); hdfs.copyFile(localFile, inPath); hdfs.ls(inPath); hdfs.cat(inFile); StringBuilder sb = new StringBuilder(); sb.append("--input ").append(inPath); sb.append(" --output ").append(outPath); sb.append(" --booleanData true"); sb.append(" --similarityClassname org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.EuclideanDistanceSimilarity"); sb.append(" --tempDir ").append(tmpPath); args = sb.toString().split(" "); RecommenderJob job = new RecommenderJob(); job.setConf(conf); job.run(args); hdfs.cat(outFile); } public static JobConf config() { JobConf conf = new JobConf(ItemCFHadoop.class); conf.setJobName("ItemCFHadoop"); conf.addResource("classpath:/hadoop/core-site.xml"); conf.addResource("classpath:/hadoop/hdfs-site.xml"); conf.addResource("classpath:/hadoop/mapred-site.xml"); return conf; }}
RecommenderJob.java,实际上就是封装了,上面整个图的分步式并行算法的执行过程!如果没有这层封装,我们需要自己去实现图中8个步骤MapReduce算法。
关于上面算法的深度剖析,请参考文章:R实现MapReduce的协同过滤算法
4). 运行程序
控制台输出:
Delete: hdfs://192.168.1.210:9000/user/hdfs/userCFCreate: hdfs://192.168.1.210:9000/user/hdfs/userCFcopy from: datafile/item.csv to hdfs://192.168.1.210:9000/user/hdfs/userCFls: hdfs://192.168.1.210:9000/user/hdfs/userCF==========================================================name: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv, folder: false, size: 229==========================================================cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv1,101,5.01,102,3.01,103,2.52,101,2.02,102,2.52,103,5.02,104,2.03,101,2.53,104,4.03,105,4.53,107,5.04,101,5.04,103,3.04,104,4.54,106,4.05,101,4.05,102,3.05,103,2.05,104,4.05,105,3.55,106,4.0SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".SLF4J: Defaulting to no-operation (NOP) logger implementationSLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.2013-10-14 10:26:35 org.apache.hadoop.util.NativeCodeLoader 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable2013-10-14 10:26:35 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:35 org.apache.hadoop.io.compress.snappy.LoadSnappy 警告: Snappy native library not loaded2013-10-14 10:26:36 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00012013-10-14 10:26:36 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:36 org.apache.hadoop.io.compress.CodecPool getCompressor信息: Got brand-new compressor2013-10-14 10:26:36 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:36 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0001_m_000000_0' done.2013-10-14 10:26:36 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 1 sorted segments2013-10-14 10:26:36 org.apache.hadoop.io.compress.CodecPool getDecompressor信息: Got brand-new decompressor2013-10-14 10:26:36 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 1 segments left of total size: 42 bytes2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:36 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0001_r_000000_0 is allowed to commit now2013-10-14 10:26:36 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/itemIDIndex2013-10-14 10:26:36 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:36 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0001_r_000000_0' done.2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00012013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Counters: 192013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Bytes Written=1872013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=32873302013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=9162013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=34432922013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=6452013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Bytes Read=2292013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=462013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Map input records=212013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Spilled Records=142013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Map output bytes=842013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=3765698562013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=1162013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Combine input records=212013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Reduce input records=72013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=72013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Combine output records=72013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Reduce output records=72013-10-14 10:26:37 org.apache.hadoop.mapred.Counters log信息: Map output records=212013-10-14 10:26:37 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:37 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00022013-10-14 10:26:37 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:37 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:37 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0002_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0002_m_000000_0' done.2013-10-14 10:26:37 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 1 sorted segments2013-10-14 10:26:37 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 1 segments left of total size: 68 bytes2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:37 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0002_r_000000_0 is allowed to commit now2013-10-14 10:26:37 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0002_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/userVectors2013-10-14 10:26:37 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:37 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0002_r_000000_0' done.2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00022013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Counters: 202013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: USERS=52013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Bytes Written=2882013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=65742742013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=13742013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=68875922013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=11202013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Bytes Read=2292013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=722013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Map input records=212013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Spilled Records=422013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Map output bytes=632013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=5759303682013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=1162013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Combine input records=02013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Reduce input records=212013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=52013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Combine output records=02013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Reduce output records=52013-10-14 10:26:38 org.apache.hadoop.mapred.Counters log信息: Map output records=212013-10-14 10:26:38 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:38 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00032013-10-14 10:26:38 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:38 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:38 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0003_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0003_m_000000_0' done.2013-10-14 10:26:38 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 1 sorted segments2013-10-14 10:26:38 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 1 segments left of total size: 89 bytes2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0003_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:38 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0003_r_000000_0 is allowed to commit now2013-10-14 10:26:38 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0003_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/preparePreferenceMatrix/ratingMatrix2013-10-14 10:26:38 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:38 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0003_r_000000_0' done.2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00032013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Counters: 212013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Bytes Written=3352013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: USER_RATINGS_NEGLECTED=02013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: USER_RATINGS_USED=212013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=98613492013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=19502013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=103319582013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=17512013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Bytes Read=2882013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=932013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Map input records=52013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Spilled Records=142013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Map output bytes=3362013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=7752908802013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=1572013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Combine input records=212013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Reduce input records=72013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=72013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Combine output records=72013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Reduce output records=72013-10-14 10:26:39 org.apache.hadoop.mapred.Counters log信息: Map output records=212013-10-14 10:26:39 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:39 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00042013-10-14 10:26:39 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:39 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:39 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0004_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0004_m_000000_0' done.2013-10-14 10:26:39 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 1 sorted segments2013-10-14 10:26:39 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 1 segments left of total size: 118 bytes2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0004_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:39 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0004_r_000000_0 is allowed to commit now2013-10-14 10:26:39 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0004_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/weights2013-10-14 10:26:39 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:39 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0004_r_000000_0' done.2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00042013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Counters: 202013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Bytes Written=3812013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=131484762013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=26282013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=137804082013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=25512013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Bytes Read=3352013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: ROWS=72013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=1222013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Map input records=72013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Spilled Records=162013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Map output bytes=5162013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=9746513922013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=1582013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Combine input records=242013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Reduce input records=82013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=82013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Combine output records=82013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Reduce output records=52013-10-14 10:26:40 org.apache.hadoop.mapred.Counters log信息: Map output records=242013-10-14 10:26:40 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:40 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00052013-10-14 10:26:40 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:40 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:40 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0005_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0005_m_000000_0' done.2013-10-14 10:26:40 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 1 sorted segments2013-10-14 10:26:40 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 1 segments left of total size: 121 bytes2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0005_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:40 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0005_r_000000_0 is allowed to commit now2013-10-14 10:26:40 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0005_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/pairwiseSimilarity2013-10-14 10:26:40 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:40 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0005_r_000000_0' done.2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00052013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Counters: 212013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Bytes Written=3922013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=164355772013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=34882013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=172300102013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=34082013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Bytes Read=3812013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: PRUNED_COOCCURRENCES=02013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: COOCCURRENCES=572013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=1252013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Map input records=52013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Spilled Records=142013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Map output bytes=7442013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=11740119042013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=1292013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Combine input records=212013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Reduce input records=72013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=72013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Combine output records=72013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Reduce output records=72013-10-14 10:26:41 org.apache.hadoop.mapred.Counters log信息: Map output records=212013-10-14 10:26:41 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:41 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00062013-10-14 10:26:41 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:41 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:41 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0006_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0006_m_000000_0' done.2013-10-14 10:26:41 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 1 sorted segments2013-10-14 10:26:41 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 1 segments left of total size: 158 bytes2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0006_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:41 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0006_r_000000_0 is allowed to commit now2013-10-14 10:26:41 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0006_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/similarityMatrix2013-10-14 10:26:41 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:41 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0006_r_000000_0' done.2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00062013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Counters: 192013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Bytes Written=5542013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=197227402013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=43422013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=206747722013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=43542013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Bytes Read=3922013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=1622013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Map input records=72013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Spilled Records=142013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Map output bytes=5992013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=13733724162013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=1402013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Combine input records=252013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Reduce input records=72013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=72013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Combine output records=72013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Reduce output records=72013-10-14 10:26:42 org.apache.hadoop.mapred.Counters log信息: Map output records=252013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:42 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00072013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:42 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0007_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0007_m_000000_0' done.2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:42 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:42 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0007_m_000001_0 is done. And is in the process of commiting2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0007_m_000001_0' done.2013-10-14 10:26:42 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 2 sorted segments2013-10-14 10:26:42 org.apache.hadoop.io.compress.CodecPool getDecompressor信息: Got brand-new decompressor2013-10-14 10:26:42 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 2 segments left of total size: 233 bytes2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0007_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:42 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0007_r_000000_0 is allowed to commit now2013-10-14 10:26:42 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0007_r_000000_0' to hdfs://192.168.1.210:9000/tmp/1381717594500/partialMultiply2013-10-14 10:26:42 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:42 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0007_r_000000_0' done.2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00072013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Counters: 192013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Bytes Written=5722013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=345179132013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=87512013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=361826302013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=79342013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Bytes Read=02013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=2412013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Map input records=122013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Spilled Records=562013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Map output bytes=4532013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=25584599042013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=6652013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Combine input records=02013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Reduce input records=282013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=72013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Combine output records=02013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Reduce output records=72013-10-14 10:26:43 org.apache.hadoop.mapred.Counters log信息: Map output records=282013-10-14 10:26:43 org.apache.hadoop.mapreduce.lib.input.FileInputFormat listStatus信息: Total input paths to process : 12013-10-14 10:26:43 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Running job: job_local_00082013-10-14 10:26:43 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: io.sort.mb = 1002013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: data buffer = 79691776/996147202013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer 信息: record buffer = 262144/3276802013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush信息: Starting flush of map output2013-10-14 10:26:43 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill信息: Finished spill 02013-10-14 10:26:43 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0008_m_000000_0 is done. And is in the process of commiting2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0008_m_000000_0' done.2013-10-14 10:26:43 org.apache.hadoop.mapred.Task initialize信息: Using ResourceCalculatorPlugin : null2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Merging 1 sorted segments2013-10-14 10:26:43 org.apache.hadoop.mapred.Merger$MergeQueue merge信息: Down to the last merge-pass, with 1 segments left of total size: 206 bytes2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task done信息: Task:attempt_local_0008_r_000000_0 is done. And is in the process of commiting2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: 2013-10-14 10:26:43 org.apache.hadoop.mapred.Task commit信息: Task attempt_local_0008_r_000000_0 is allowed to commit now2013-10-14 10:26:43 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter commitTask信息: Saved output of task 'attempt_local_0008_r_000000_0' to hdfs://192.168.1.210:9000/user/hdfs/userCF/result2013-10-14 10:26:43 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate信息: reduce > reduce2013-10-14 10:26:43 org.apache.hadoop.mapred.Task sendDone信息: Task 'attempt_local_0008_r_000000_0' done.2013-10-14 10:26:44 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: map 100% reduce 100%2013-10-14 10:26:44 org.apache.hadoop.mapred.JobClient monitorAndPrintJob信息: Job complete: job_local_00082013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Counters: 192013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: File Output Format Counters 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Bytes Written=2172013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: FileSystemCounters2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_READ=262998022013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_READ=73572013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: FILE_BYTES_WRITTEN=275664082013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: HDFS_BYTES_WRITTEN=62692013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: File Input Format Counters 2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Bytes Read=5722013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Map-Reduce Framework2013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Map output materialized bytes=2102013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Map input records=72013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Reduce shuffle bytes=02013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Spilled Records=422013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Map output bytes=9272013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Total committed heap usage (bytes)=19714539522013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: SPLIT_RAW_BYTES=1372013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Combine input records=02013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Reduce input records=212013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Reduce input groups=52013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Combine output records=02013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Reduce output records=52013-10-14 10:26:44 org.apache.hadoop.mapred.Counters log信息: Map output records=21cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/result//part-r-000001 [104:1.280239,106:1.1462644,105:1.0653841,107:0.33333334]2 [106:1.560478,105:1.4795978,107:0.69935876]3 [103:1.2475469,106:1.1944525,102:1.1462644]4 [102:1.6462644,105:1.5277859,107:0.69935876]5 [107:1.1993587]
5). 推荐结果解读
我们可以把上面的日志分解析成3个部分解读
a. 初始化环境
b. 算法执行
c. 打印推荐结果
a. 初始化环境
出初HDFS的数据目录和工作目录,并上传数据文件。
Delete: hdfs://192.168.1.210:9000/user/hdfs/userCFCreate: hdfs://192.168.1.210:9000/user/hdfs/userCFcopy from: datafile/item.csv to hdfs://192.168.1.210:9000/user/hdfs/userCFls: hdfs://192.168.1.210:9000/user/hdfs/userCF==========================================================name: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv, folder: false, size: 229==========================================================cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/item.csv
b. 算法执行
分别执行,上图中对应的8种MapReduce算法。
Job complete: job_local_0001Job complete: job_local_0002Job complete: job_local_0003Job complete: job_local_0004Job complete: job_local_0005Job complete: job_local_0006Job complete: job_local_0007Job complete: job_local_0008
c. 打印推荐结果
方便我们看到计算后的推荐结果
cat: hdfs://192.168.1.210:9000/user/hdfs/userCF/result//part-r-000001 [104:1.280239,106:1.1462644,105:1.0653841,107:0.33333334]2 [106:1.560478,105:1.4795978,107:0.69935876]3 [103:1.2475469,106:1.1944525,102:1.1462644]4 [102:1.6462644,105:1.5277859,107:0.69935876]5 [107:1.1993587]
4.模板项目上传github
~ git clone https://github.com/bsspirit/maven_mahout_template~ git checkout mahout-0.8
我们完成了基于物品的协同过滤分步式算法实现,下面将继续介绍Mahout的Kmeans的分步式算法实现,请参考文章:Mahout分步式程序开发 聚类Kmeans
- Mahout分布式程序开发 基于物品的协同过滤ItemCF
- Mahout分步式程序开发 基于物品的协同过滤ItemCF【一起学Mahout】
- Mahout分步式程序开发 基于物品的协同过滤ItemCF
- Mahout分步式程序开发 基于物品的协同过滤ItemCF
- itemCF 基于物品的协同过滤
- mahout基于物品的协同过滤指令
- 基于物品的协同过滤ItemCF的mapreduce实现
- 基于物品的协同过滤ItemCF的mapreduce实现
- [推荐算法]ItemCF,基于物品的协同过滤算法
- [推荐算法]ItemCF,基于物品的协同过滤算法
- [推荐算法]ItemCF,基于物品的协同过滤算法
- Hadoop案例之基于物品的协同过滤算法ItemCF
- [推荐算法]ItemCF,基于物品的协同过滤算法
- [推荐算法]ItemCF,基于物品的协同过滤算法
- [推荐算法]ItemCF,基于物品的协同过滤算法
- mapreduce实现ItemCF——基于物品的协同过滤
- 基于物品的协同过滤算法itemCF原理及python代码实现
- 协同过滤:基于用户的协同过滤itemCF
- 第六周上机实践项目3-人数不定的工资类
- kkkkkkkk
- Android---11---TextView显示图片信息
- Android 开源框架Universal-Image-Loader完全解析(一)--- 基本介绍及使用
- CreateProcess示例
- Mahout分布式程序开发 基于物品的协同过滤ItemCF
- codeforces 55D Beautiful numbers[数位dp]
- C++LNK错误总结(摘抄别人的),留着自已用,哈哈。。。
- JAVA第三次实验
- 第六周上机实践项目4-成员函数、友元函数和一般函数有区别
- tomcat jdk内存配置
- UVA 1344 - Tian Ji -- The Horse Racing(很好的贪心题)
- 用PriorityBlockingQueue简化线程优先级调度策略
- Bias and Variance