Hadoop2.2.0 benchmark:MapReduce example randomwriter的参数设置

来源:互联网 发布:javascript 延迟 编辑:程序博客网 时间:2024/05/22 17:51

想运行一些hadoop自带的example,参考了这篇:《hadoop2.2基准测试》http://www.cnblogs.com/lucius/p/3421970.html


但是在执行过程中遇到问题,在第二部分MapReduce Test with Sort中,并不能通过参数test.randomwriter.maps_per_host, test.randomwrite.bytes_per_map来控制randomwriter生成数据,无论怎么改还是每个node上启动10个map task, 每个map 产生1GB的random data.


通过万能的Stack Overflow找到答案:在mr v2中,参数名变化了= =,不知道为什么很多说自己环境是hadoop2.x的为什么能用原来的参数名设置生效的。

http://stackoverflow.com/questions/25369721/error-during-benchmarking-sort-in-hadoop2-partitions-do-not-match

“Dennis Huo”的第二条评论:

Do you have the actual command you used for the initialrandomwriter job as well as the beginning of the console output? It's a bit strange to have a single 1GB output fromrandomwriter, though in part it's because while Hadoop 1 uses test.randomwrite.bytes_per_map and test.randomwriter.maps_per_host, Hadoop 2 uses the keysmapreduce.randomwriter.bytespermap andmapreduce.randomwriter.mapsperhost, as seen inRandomWriter.java.


 RandomWriter.java指向

http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-mapreduce-client-jobclient/2.2.0/org/apache/hadoop/mapreduce/RandomWriter.java

也就是源码= =


从源码中可以看到randomwirter各个参数的正确使用姿势。

config:

 <?xml version="1.0"?>  <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  <configuration>  <property>  <name>mapreduce.randomwriter.minkey</name>  <value>10</value> </property>  <property>  <name>mapreduce.randomwriter.maxkey</name>  <value>10</value>  </property>  <property>  <name>mapreduce.randomwriter.minvalue</name>  <value>90</value>  </property>  <property>  <name>mapreduce.randomwriter.maxvalue</name>  <value>90</value>  </property>  <property>  <name>mapreduce.randomwriter.totalbytes</name>  <value>1099511627776</value>  </property>  </configuration> 

命令行:

RandomWriter extends Configured implements Tool {   public static final String TOTAL_BYTES = "mapreduce.randomwriter.totalbytes";   public static final String BYTES_PER_MAP = "mapreduce.randomwriter.bytespermap";   public static final String MAPS_PER_HOST =  "mapreduce.randomwriter.mapsperhost";   public static final String MAX_VALUE = "mapreduce.randomwriter.maxvalue";   public static final String MIN_VALUE = "mapreduce.randomwriter.minvalue";   public static final String MIN_KEY = "mapreduce.randomwriter.minkey";   public static final String MAX_KEY = "mapreduce.randomwriter.maxkey";...

测试:

 yarn jar /app/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriter \-D mapreduce.randomwriter.mapsperhost=3 \-D mapreduce.randomwriter.totalbytes=19327352832 \-D mapreduce.randomwriter.bytespermap=1073741824 \randomdata/

输出:

...15/06/17 14:28:30 INFO mapreduce.JobSubmitter: number of splits:18...$ hadoop fs -ls randomdataFound 19 items-rw-r--r--   3 hadoop supergroup          0 2015-06-17 14:33 randomdata/_SUCCESS-rw-r--r--   3 hadoop supergroup 1077276810 2015-06-17 14:33 randomdata/part-m-00000-rw-r--r--   3 hadoop supergroup 1077276760 2015-06-17 14:31 randomdata/part-m-00001-rw-r--r--   3 hadoop supergroup 1077287420 2015-06-17 14:33 randomdata/part-m-00002-rw-r--r--   3 hadoop supergroup 1077287513 2015-06-17 14:31 randomdata/part-m-00003-rw-r--r--   3 hadoop supergroup 1077298535 2015-06-17 14:32 randomdata/part-m-00004-rw-r--r--   3 hadoop supergroup 1077293518 2015-06-17 14:30 randomdata/part-m-00005-rw-r--r--   3 hadoop supergroup 1077283934 2015-06-17 14:32 randomdata/part-m-00006-rw-r--r--   3 hadoop supergroup 1077275682 2015-06-17 14:33 randomdata/part-m-00007-rw-r--r--   3 hadoop supergroup 1077295170 2015-06-17 14:33 randomdata/part-m-00008-rw-r--r--   3 hadoop supergroup 1077286683 2015-06-17 14:32 randomdata/part-m-00009-rw-r--r--   3 hadoop supergroup 1077269290 2015-06-17 14:31 randomdata/part-m-00010-rw-r--r--   3 hadoop supergroup 1077295589 2015-06-17 14:33 randomdata/part-m-00011-rw-r--r--   3 hadoop supergroup 1077273046 2015-06-17 14:30 randomdata/part-m-00012-rw-r--r--   3 hadoop supergroup 1077284465 2015-06-17 14:31 randomdata/part-m-00013-rw-r--r--   3 hadoop supergroup 1077270048 2015-06-17 14:31 randomdata/part-m-00014-rw-r--r--   3 hadoop supergroup 1077283027 2015-06-17 14:33 randomdata/part-m-00015-rw-r--r--   3 hadoop supergroup 1077292383 2015-06-17 14:31 randomdata/part-m-00016-rw-r--r--   3 hadoop supergroup 1077291977 2015-06-17 14:32 randomdata/part-m-00017
可见参数生效。

这个故事告诉我们。。不能总是偷懒查别人的资料,有时候也应该去看看源码。。


0 0
原创粉丝点击