[Lecture] hash join

来源：互联网发布：淘宝卖家助手要钱吗? 编辑：程序博客网时间：2024/05/16 01:22

from: https://cwiki.apache.org/confluence/display/Hive/Hybrid+Hybrid+Grace+Hash+Join,+v1.0

Simple Hash Join

Also known as Classic Hash Join, it is used when the hash table for R can entirely fit into the memory.

In general, all hash join algorithms described here have two phases: Build phase and Probe phase. During the build phase, a hash table is built into memory based on the joining column(s) from the small table R. During the probe phase, the big table S is scanned sequentially and for each row of S, the hash table is probed for matching rows. If a match is found, output the pair, otherwise drop the row from S and continue scanning S.

When the hash table for R cannot wholly fit in the memory, only part of the hash table for R will be put in memory first and S is scanned against the partial hash table. After the scan on S is completed, the memory is cleared and another part of hash table of R is put in memory, and S is scanned again. This process can repeat more times if there are more parts of the hash table.

GRACE Hash Join

Apparently when the size of small table R is much bigger than memory, we end up having many partial hash tables of R loaded in the memory one by one, and the big table S being scanned multiple times, which is very expensive.

GRACE hash join brings rounds of scanning the big table from many times down to just twice. One for partitioning, the other for row matching. Similarly, the small table will also be scanned twice.

Here is the detailed process.

Small table R is scanned, and during the scan a hash function is used to distribute the rows into different output buffers (or partitions). The size of each output buffer should be specified as close as possible to the memory limit but no more than that. After R has been completely scanned, all output buffers are flushed to disk.
Similarly, big table S is scanned, partitioned and flushed to disk the same way. The hash function used here is the same one as in the previous step.
Load one partition of R into memory and build a hash table for it. This is the build phase.
Hash each row of the corresponding partition of S, and probe for a match in R’s hash table. If a match is found, output the pair to the result, otherwise, proceed with the next row in the current S partition. This is the probe phase.
Repeat loading and building hash table for partitions of R and probing partitions of S, until both are exhausted.

It can be seen there are extra partitioning steps in this algorithm (1 & 2). The rest of the steps are the same as Classic Hash Join, i.e., building and probing. One assumption here is all partitions of R can completely fit into the memory.

Hybrid GRACE Hash Join

It is a hybrid of Classic Hash Join and GRACE Hash Join. The idea is to build an in-memory hash table for the first partition of R during the partitioning phase, without the need to write this partition to disk. Similarly, while partitioning S, the first partition does not have to be put to the disk since probing can be directly done against the in-memory first partition of R. So at the end of the partitioning phase, the join of the first pair of partitions of R and S has already been done.

Comparing to GRACE hash join, Hybrid GRACE hash join has an advantage of avoiding writing the first partitions of R and S to disk during the partitioning phase and reading them back in again during the probing phase.

0 0