mapjoin测试

来源:互联网 发布:怎么申请淘宝介入退款 编辑:程序博客网 时间:2024/05/19 17:24
测试环境:hadoop2.6+hive1.2.1

mapjoin原理:
首先在本地将小表序列化为hashtable,然后将生成的文件上传到hdfs上,*(使用分布式缓存distribute cache分发文件)之后在map阶段,每个map会将小表生成的文件重新序列化到内存中,从而开始做mapjoin。

内存测试:
insert overwrite table maptable select dvc_id from tds_did_user_targ_mon limit 1000000;

select * from maptable a join tds_did_user_targ_mon b on a.dvc_id=b.dvc_id;

单列测试,表maptable只有一个字段dvc_id:
表mapjoin  89.1M;     93413092
hashtable  105M;       110826705
memory use:209M     209012008    

打印的日志:
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Execution log at: /tmp/test/test_20151014102845_e1950317-7e21-4873-b41c-618e2d27b737.log
2015-10-14 10:29:33     Starting to launch local task to process map join;     maximum memory = 932184064
2015-10-14 10:29:38     Processing rows:     200000     Hashtable size:     199999     Memory usage:     155048584     percentage:     0.166
2015-10-14 10:29:38     Processing rows:     300000     Hashtable size:     299999     Memory usage:     175643392     percentage:     0.188
2015-10-14 10:29:38     Processing rows:     400000     Hashtable size:     399999     Memory usage:     205581064     percentage:     0.221
2015-10-14 10:29:40     Processing rows:     500000     Hashtable size:     499999     Memory usage:     226176000     percentage:     0.243
2015-10-14 10:29:43     Processing rows:     600000     Hashtable size:     599999     Memory usage:     251919416     percentage:     0.27
2015-10-14 10:29:45     Processing rows:     700000     Hashtable size:     699999     Memory usage:     277663272     percentage:     0.298
2015-10-14 10:29:48     Processing rows:     800000     Hashtable size:     799999     Memory usage:     306647072     percentage:     0.329
2015-10-14 10:30:09     Processing rows:     900000     Hashtable size:     899999     Memory usage:     209012008     percentage:     0.224
2015-10-14 10:30:09     Dump the side-table for tag: 0 with group count: 975627 into file: file:/tmp/test/85ce3e7d-7870-4ba1-ae11-313e448a23ba/hive_2015-10-14_10-28-45_275_8145115576089978-1/-local-10003/HashTable-Stage-4/MapJoin-mapfile10--.hashtable
2015-10-14 10:30:30     Uploaded 1 File to: file:/tmp/test/85ce3e7d-7870-4ba1-ae11-313e448a23ba/hive_2015-10-14_10-28-45_275_8145115576089978-1/-local-10003/HashTable-Stage-4/MapJoin-mapfile10--.hashtable (110826705 bytes)
2015-10-14 10:30:30     End of local task; Time Taken: 57.309 sec.
Execution completed successfully
MapredLocal task succeeded



hive1.2.1默认的java可使用最大内存为1G
可通过以下参数来设置本地生成mapjoin的本地任务时可用的最大内存,即Starting to launch local task to process map join;     maximum memory =
export HADOOP_OPTS=-Xmx2048m

注意:这个参数可用的最大内存应该小于yarn给map分配到的内存
0 0
原创粉丝点击