[ Hadoop | MapReduce ] 使用 CompositeInputSplit 来提高Join效率

来源：互联网发布：联想win7还原软件编辑：程序博客网时间：2024/05/16 19:20

Map side join is the most efficient way. On Hadoop, between two large datasets, we can utilizeComposite Join to achieve this goal.

The Use Case

First use Identity Mapper and Identity Reducer to sort and partition two inputs, making both have same partition numbers.

use -Dmapred.reduce.tasks=2

Secondly, use composite join…

Note: if the two inputs have different partition numbers(i.e. part* files) , an exception will be thrown: java.io.IOException: Inconsistent split cardinality from child 1 (1/2)

The simplest way to use composite join is to make reduce number = 1, so that there is only one partition for each input file, provided the performance is fine.

The Source Code for the application

0 0

[ Hadoop | MapReduce ] 使用 CompositeInputSplit 来提高Join效率
hadoop mapreduce join
hadoop MapReduce join
hadoop MapReduce join
hadoop MapReduce join
Hadoop MapReduce进阶使用DataJoin包实现Join
Hadoop MapReduce进阶使用分布式缓存进行replicated join
Hadoop MapReduce进阶使用DataJoin包实现Join
Hadoop MapReduce进阶使用分布式缓存进行replicated join
Hadoop MapReduce进阶使用DataJoin包实现Join
Hadoop MapReduce进阶使用分布式缓存进行replicated join
Hadoop MapReduce进阶使用分布式缓存进行replicated join
Hadoop MapReduce进阶使用DataJoin包实现Join
Hadoop MapReduce进阶使用分布式缓存进行replicated join
Hadoop MapReduce进阶使用分布式缓存进行replicated join
Hadoop MapReduce进阶使用DataJoin包实现Join
Hadoop MapReduce进阶使用DataJoin包实现Join
Hadoop MapReduce之Join示例
IT人好的学习网点--转载自慕课网 http://www.imooc.com/about/friendly
Android LocalBroadcastManager提高应用安全性
python笔记10--urllib模块
Java内存回收机制
快速排序算法java实现
[ Hadoop | MapReduce ] 使用 CompositeInputSplit 来提高Join效率
Chromium 清除DNS 缓存的方法
Length of Last Word
jQuery 全选反选超简单示例
安卓微信朋友圈界面
在linux上使用yum安装jdk
solr 5.0.0 新手快速入门
YII2（一）用YII2创建、迁移数据表 migrations
NetworkInterface的使用