mahout on hadoop2 实践
来源:互联网 发布:php获取时间轴 编辑:程序博客网 时间:2024/06/07 21:34
1. 感谢sunshine_junge在about上发的帖子《hadoop2.2+mahout0.9实战》,让我跨过这个难题:
java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:174) at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:614) at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:73) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
方法如:
这个是因为目前mahout只支持hadoop1 的缘故。在这里可以找到解决方法:https://issues.apache.org/jira/browse/MAHOUT-1329。主要就是修改pom文件,修改mahout的依赖。大家可以下载修改后的源码包(http://download.csdn.net/detail/fansy1990/7165957)自己编译mahout(mvn clean install -Dhadoop2 -Dhadoop.2.version=2.2.0 -DskipTests),或者直接下载已经编译好的jar包(http://download.csdn.net/detail/fansy1990/7166017、http://download.csdn.net/detail/fansy1990/7166055)。
我当时没注意看编译命令,直接使用mvn clean package,是编译通过,但还是不支持hadoop2,后来直接下载作者编译好的包跑过的。
2. 另一个问题也是比较奇怪,被我自己解决:
运行时需要一个commons-cli2的包,这个包是我下载源码,并且编译得到的,官网上也找不到可以直接使用的包。下载源码后,去掉了工程中pom.xml中的<parent>...</parent>,否则编译不过。最好设置一下环境变量:
export HADOOP_CLASSPATH=$(hadoop classpath)export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/hdfs/go1233recom/lib/commons-cli2-2.0-SNAPSHOT.jar
运行的命令如下:
hadoop jar /home/hdfs/mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob --input /input/user.csv -s SIMILARITY_EUCLIDEAN_DISTANCE --output output1
截取了最后的运行过程:
15/11/27 15:24:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447409744056_002215/11/27 15:24:09 INFO impl.YarnClientImpl: Submitted application application_1447409744056_002215/11/27 15:24:09 INFO mapreduce.Job: The url to track the job: http://bd2.com:8088/proxy/application_1447409744056_0022/15/11/27 15:24:09 INFO mapreduce.Job: Running job: job_1447409744056_002215/11/27 15:24:20 INFO mapreduce.Job: Job job_1447409744056_0022 running in uber mode : false15/11/27 15:24:20 INFO mapreduce.Job: map 0% reduce 0%15/11/27 15:24:30 INFO mapreduce.Job: map 50% reduce 0%15/11/27 15:24:31 INFO mapreduce.Job: map 100% reduce 0%15/11/27 15:24:38 INFO mapreduce.Job: map 100% reduce 100%15/11/27 15:24:38 INFO mapreduce.Job: Job job_1447409744056_0022 completed successfully15/11/27 15:24:38 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=326 FILE: Number of bytes written=381232 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1489 HDFS: Number of bytes written=572 HDFS: Number of read operations=11 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=18295 Total time spent by all reduces in occupied slots (ms)=9074 Total time spent by all map tasks (ms)=18295 Total time spent by all reduce tasks (ms)=4537 Total vcore-seconds taken by all map tasks=18295 Total vcore-seconds taken by all reduce tasks=4537 Total megabyte-seconds taken by all map tasks=9367040 Total megabyte-seconds taken by all reduce tasks=4645888 Map-Reduce Framework Map input records=12 Map output records=28 Map output bytes=453 Map output materialized bytes=324 Input split bytes=647 Combine input records=0 Combine output records=0 Reduce input groups=7 Reduce shuffle bytes=324 Reduce input records=28 Reduce output records=7 Spilled Records=56 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=207 CPU time spent (ms)=2940 Physical memory (bytes) snapshot=1109094400 Virtual memory (bytes) snapshot=3818016768 Total committed heap usage (bytes)=938999808 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=57215/11/27 15:24:38 INFO impl.TimelineClientImpl: Timeline service address: http://bd2.com:8188/ws/v1/timeline/15/11/27 15:24:38 INFO client.RMProxy: Connecting to ResourceManager at bd2.com/10.252.169.250:805015/11/27 15:24:39 INFO input.FileInputFormat: Total input paths to process : 115/11/27 15:24:39 INFO mapreduce.JobSubmitter: number of splits:115/11/27 15:24:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447409744056_002315/11/27 15:24:39 INFO impl.YarnClientImpl: Submitted application application_1447409744056_002315/11/27 15:24:39 INFO mapreduce.Job: The url to track the job: http://bd2.com:8088/proxy/application_1447409744056_0023/15/11/27 15:24:39 INFO mapreduce.Job: Running job: job_1447409744056_002315/11/27 15:24:48 INFO mapreduce.Job: Job job_1447409744056_0023 running in uber mode : false15/11/27 15:24:48 INFO mapreduce.Job: map 0% reduce 0%15/11/27 15:24:55 INFO mapreduce.Job: map 100% reduce 0%15/11/27 15:25:02 INFO mapreduce.Job: map 100% reduce 100%15/11/27 15:25:02 INFO mapreduce.Job: Job job_1447409744056_0023 completed successfully15/11/27 15:25:03 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=306 FILE: Number of bytes written=254265 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=887 HDFS: Number of bytes written=192 HDFS: Number of read operations=10 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=4088 Total time spent by all reduces in occupied slots (ms)=9604 Total time spent by all map tasks (ms)=4088 Total time spent by all reduce tasks (ms)=4802 Total vcore-seconds taken by all map tasks=4088 Total vcore-seconds taken by all reduce tasks=4802 Total megabyte-seconds taken by all map tasks=2093056 Total megabyte-seconds taken by all reduce tasks=4917248 Map-Reduce Framework Map input records=7 Map output records=21 Map output bytes=927 Map output materialized bytes=298 Input split bytes=128 Combine input records=0 Combine output records=0 Reduce input groups=5 Reduce shuffle bytes=298 Reduce input records=21 Reduce output records=5 Spilled Records=42 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=97 CPU time spent (ms)=1910 Physical memory (bytes) snapshot=589557760 Virtual memory (bytes) snapshot=2723651584 Total committed heap usage (bytes)=455606272 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=572 File Output Format Counters Bytes Written=192[hdfs@bd4 ~]$
参考的资料还有:
《RecommenderJob源码分析(Step by Step)》
0 0
- mahout on hadoop2 实践
- hadoop2.2+mahout-0.8
- Hadoop2.2.0构建mahout环境
- mahout hadoop2.6.0 编译失败
- Mahout实践指南
- mahout部署实践
- mahout部署实践
- 实践mahout推荐引擎
- Maven编译安装Mahout for Hadoop2
- 编译mahout使其支持hadoop2.6
- mahout 0.9与 hadoop2.x 冲突 解决办法
- hadoop2.7.3 mahout 0.9 遇到问题
- spark on hadoop2.0
- hadoop2.7.0实践- WordCount
- win7,hadoop2.5.2实践
- hadoop2.7.4 CentOS64bit 实践
- Mahout推荐算法编程实践
- hadoop2.7.0实践-环境搭建
- SSH:Hibernate框架(Hibernate复合主键映射)
- 各种功能网站合集
- 核心动画
- ref参数
- 通过URLHttpConnection方式连接网络步骤,获取位图为例
- mahout on hadoop2 实践
- PJSIP注册过程中保持VIA头域sent-by(发送原地址)不变
- 关于PopupWindow的简单说明
- 15.1 DIB 文件格式
- 删除已有数组元素
- 通过URLHttpConnection方式来取得图片,并且显示在ImageView上
- CCActionEase(2)
- js 给元素绑定键盘回车事件
- tomcat性能调整,稳定一定访问量(转载)