Cloud Computing(6)_Processing Relational Data
来源:互联网 发布:公司网络服务器搭建 编辑:程序博客网 时间:2024/05/21 21:42
Join Algorithms in MapReduce
- Reduce-Side Join
- Map-Side Join
- Memory-Backed join
Reduce-Side Join
we map over both datasets and emit the join key as the intermediate key, and the tuple itself as the intermediate value. Since MapReduce guarantees that all values with the same key are brought together, all tuples will be grouped by the join key|which is exactly what we need to perform the join operation.
The approach isn’t particularly efficient since it requires shuffling both datasets across the network.
Map-Side Join
we map over one of the datasets (the larger one) and inside the mapper read the corresponding part of the other dataset to perform the merge join.
Memory-Backed Join
we can load the smaller dataset into memory in every mapper, populating an associative array to facilitate random access to tuples based on the join key.
Which Join to use?
Memory-Backed Join > Map-Side Join > Reduce-Side Join
- Cloud Computing(6)_Processing Relational Data
- [Cloud Computing]Mechanisms: Data Transport
- [Cloud Computing]Patterns: Dynamic Data Normalization
- Cloud Computing(5)_Big Data Infrastructure
- [Cloud Computing]Mechanisms: Cloud Storage Data Aging Management
- [Cloud Computing]Mechanisms: Cloud Storage Data Placement Auditor
- Efficient multi-keyword ranked query over encrypted data in cloud computing (6)
- Cloud Computing
- Cloud computing
- Cloud computing
- Cloud Computing
- Cloud Computing
- Cloud Computing
- Cloud Computing
- [Cloud Computing]Patterns: Intra-Storage Device Vertical Data Tiering
- Cloud Computing(0)_What is Cloud Computing?
- Displaying Nested Relational Data
- Relational Data Model
- 命令行下使用CL.exe编译多cpp文件工程
- Hibernate---SQL原生态查询
- 【网易云课堂---轻松读书:番茄工作法】
- ACM书中题目——M
- 为什么PHP中in_array效率低
- Cloud Computing(6)_Processing Relational Data
- 实验一 彩色空间转换
- 图解hive运行机制
- 手把手教你写专利申请书/如何申请专利
- 我所理解的Handler的使用及其原理浅析
- ElasticSearch 5.0.0 安装部署常见错误或问题
- KNN 算法理解
- codevs1247排排站
- 服务器-Web框架配置