hive理解streamtable使用

来源:互联网 发布:hadoop数据挖掘实例 编辑:程序博客网 时间:2024/05/29 11:51

今天看别人的博客,发现streamtable这个东西,作者描述是:

将大表放在JION的右边,这是就需要指定使用/*+ STREAMTABLE(..) */:

  1. hive> SELECT /*+ STREAMTABLE(b) */ a.val, b.val, c.val FROM a JOIN b
  2. > ON (a.key = b.key1) JOIN c将大表放在JION的右边,这是就需要指定使用/*+ STREAMTABLE(..) */:hive> SELECT /*+ STREAMTABLE(b) */ a.val, b.val, c.val FROM a JOIN b      > ON (a.key = b.key1) JOIN c ON (c.key = b.key1) ON (c.key = b.key1)
有点懵懂,看完另一个哥们写的才若有所悟
From my understanding, when you have the join happening in map or reduce, the values corresponding to a key from all all table's except one (if two tables are involved in join on same key, then just one table here) are buffered in memory and the left out one is streamed. Usually it is the largest table to be streamed, else the larger data can go into the memory(buffer) and create OOM errors.
This stream table hint is used to specify which table to be streamed. By default it is the table that comes on the right is streamed and the other is buffered. But if you wan't  other  than right table to be streamed you go for this hint.
If you are joining more tables on different keys, then for every join set just specify the larger table on the right of ON condition. No need of stream table hint here.
0 0