mapjoin测试
来源:互联网 发布:怎么申请淘宝介入退款 编辑:程序博客网 时间:2024/05/19 17:24
测试环境:hadoop2.6+hive1.2.1
mapjoin原理:
首先在本地将小表序列化为hashtable,然后将生成的文件上传到hdfs上,*(使用分布式缓存distribute cache分发文件)之后在map阶段,每个map会将小表生成的文件重新序列化到内存中,从而开始做mapjoin。
内存测试:
insert overwrite table maptable select dvc_id from tds_did_user_targ_mon limit 1000000;
select * from maptable a join tds_did_user_targ_mon b on a.dvc_id=b.dvc_id;
单列测试,表maptable只有一个字段dvc_id:
表mapjoin 89.1M; 93413092
hashtable 105M; 110826705
memory use:209M 209012008
打印的日志:
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Execution log at: /tmp/test/test_20151014102845_e1950317-7e21-4873-b41c-618e2d27b737.log
2015-10-14 10:29:33 Starting to launch local task to process map join; maximum memory = 932184064
2015-10-14 10:29:38 Processing rows: 200000 Hashtable size: 199999 Memory usage: 155048584 percentage: 0.166
2015-10-14 10:29:38 Processing rows: 300000 Hashtable size: 299999 Memory usage: 175643392 percentage: 0.188
2015-10-14 10:29:38 Processing rows: 400000 Hashtable size: 399999 Memory usage: 205581064 percentage: 0.221
2015-10-14 10:29:40 Processing rows: 500000 Hashtable size: 499999 Memory usage: 226176000 percentage: 0.243
2015-10-14 10:29:43 Processing rows: 600000 Hashtable size: 599999 Memory usage: 251919416 percentage: 0.27
2015-10-14 10:29:45 Processing rows: 700000 Hashtable size: 699999 Memory usage: 277663272 percentage: 0.298
2015-10-14 10:29:48 Processing rows: 800000 Hashtable size: 799999 Memory usage: 306647072 percentage: 0.329
2015-10-14 10:30:09 Processing rows: 900000 Hashtable size: 899999 Memory usage: 209012008 percentage: 0.224
2015-10-14 10:30:09 Dump the side-table for tag: 0 with group count: 975627 into file: file:/tmp/test/85ce3e7d-7870-4ba1-ae11-313e448a23ba/hive_2015-10-14_10-28-45_275_8145115576089978-1/-local-10003/HashTable-Stage-4/MapJoin-mapfile10--.hashtable
2015-10-14 10:30:30 Uploaded 1 File to: file:/tmp/test/85ce3e7d-7870-4ba1-ae11-313e448a23ba/hive_2015-10-14_10-28-45_275_8145115576089978-1/-local-10003/HashTable-Stage-4/MapJoin-mapfile10--.hashtable (110826705 bytes)
2015-10-14 10:30:30 End of local task; Time Taken: 57.309 sec.
Execution completed successfully
Execution log at: /tmp/test/test_20151014102845_e1950317-7e21-4873-b41c-618e2d27b737.log
2015-10-14 10:29:33 Starting to launch local task to process map join; maximum memory = 932184064
2015-10-14 10:29:38 Processing rows: 200000 Hashtable size: 199999 Memory usage: 155048584 percentage: 0.166
2015-10-14 10:29:38 Processing rows: 300000 Hashtable size: 299999 Memory usage: 175643392 percentage: 0.188
2015-10-14 10:29:38 Processing rows: 400000 Hashtable size: 399999 Memory usage: 205581064 percentage: 0.221
2015-10-14 10:29:40 Processing rows: 500000 Hashtable size: 499999 Memory usage: 226176000 percentage: 0.243
2015-10-14 10:29:43 Processing rows: 600000 Hashtable size: 599999 Memory usage: 251919416 percentage: 0.27
2015-10-14 10:29:45 Processing rows: 700000 Hashtable size: 699999 Memory usage: 277663272 percentage: 0.298
2015-10-14 10:29:48 Processing rows: 800000 Hashtable size: 799999 Memory usage: 306647072 percentage: 0.329
2015-10-14 10:30:09 Processing rows: 900000 Hashtable size: 899999 Memory usage: 209012008 percentage: 0.224
2015-10-14 10:30:09 Dump the side-table for tag: 0 with group count: 975627 into file: file:/tmp/test/85ce3e7d-7870-4ba1-ae11-313e448a23ba/hive_2015-10-14_10-28-45_275_8145115576089978-1/-local-10003/HashTable-Stage-4/MapJoin-mapfile10--.hashtable
2015-10-14 10:30:30 Uploaded 1 File to: file:/tmp/test/85ce3e7d-7870-4ba1-ae11-313e448a23ba/hive_2015-10-14_10-28-45_275_8145115576089978-1/-local-10003/HashTable-Stage-4/MapJoin-mapfile10--.hashtable (110826705 bytes)
2015-10-14 10:30:30 End of local task; Time Taken: 57.309 sec.
Execution completed successfully
MapredLocal task succeeded
hive1.2.1默认的java可使用最大内存为1G
可通过以下参数来设置本地生成mapjoin的本地任务时可用的最大内存,即Starting to launch local task to process map join; maximum memory =
export HADOOP_OPTS=-Xmx2048m
注意:这个参数可用的最大内存应该小于yarn给map分配到的内存
0 0
- mapjoin测试
- 一个关于MapJoin的测试用例
- MapJoin原理
- hive mapjoin
- Hive MapJoin
- mapjoin解析
- Hive mapjoin 与 Bucket mapjoin
- hive mapjoin使用
- 使用mapjoin效率对比
- hive MapJoin优化
- Hive MapJoin 优化
- #Hive#Mapjoin的使用
- Hive优化----MapJoin 优化
- Hive中的mapjoin
- hive mapjoin使用
- Hive中的mapjoin
- hive mapjoin使用
- hive mapjoin使用
- 关于android第三方包混淆的经验总结
- ifconfig,iwconfig,wpa_ctl的使用
- 为什么要用内部类
- Linux下的编码和字符集的资料
- 字符串NSString和数组NSArray操作
- mapjoin测试
- opencv第九章-图像局部与分割
- Spring的配置
- Microsoft office Excel 不再支持Microsoft Map解决技巧
- 详解LOG4J配置
- 运动模板
- SPRING中的线程池ThreadPoolTaskExecutor
- OC学习笔记共享
- 对oracle数据库使用sql脚本进行导入导出