hive参数hive.mapred.mode分析

来源：互联网发布：琅琊榜2 知乎编辑：程序博客网时间：2024/06/04 19:38

Hive配置中有个参数hive.mapred.mode，分为nonstrict，strict，默认是nonstrict

如果设置为strict，会对三种情况的语句在compile环节做过滤：

1. 笛卡尔积Join。这种情况由于没有指定reduce join key，所以只会启用一个reducer，数据量大时会造成性能瓶颈

?
1
2
3
4
5
6
7
8
9
10
// Use only 1 reducer in case of cartesian product  
if(reduceKeys.size() == 0) {  
  numReds = 1; 
   
  // Cartesian product is not supported in strict mode  
  if(conf.getVar(HiveConf.ConfVars.HIVEMAPREDMODE).equalsIgnoreCase(  
      "strict")) {  
    thrownew SemanticException(ErrorMsg.NO_CARTESIAN_PRODUCT.getMsg());  
  } 
}

2. order by后面不跟limit。order by会强制将reduce number设置成1，不加limit，会将所有数据sink到reduce端来做全排序。

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
if(sortExprs == null) {  
  sortExprs = qb.getParseInfo().getOrderByForClause(dest);  
  if(sortExprs != null) {  
    assertnumReducers == 1; 
    // in strict mode, in the presence of order by, limit must be specified  
    Integer limit = qb.getParseInfo().getDestLimit(dest);  
    if(conf.getVar(HiveConf.ConfVars.HIVEMAPREDMODE).equalsIgnoreCase(  
        "strict") 
        && limit == null) {  
      thrownew SemanticException(generateErrorMessage(sortExprs,  
            ErrorMsg.NO_LIMIT_WITH_ORDERBY.getMsg())); 
    } 
  } 
}

3. 读取的表是partitioned table，但没有指定partition predicate。

注：如果是多级分区表的话，只要出现任何一个就放行

?
1
2
3
4
5
6
7
8
9
10
// If the "strict" mode is on, we have to provide partition pruner for  
// each table.  
if("strict".equalsIgnoreCase(HiveConf.getVar(conf, 
    HiveConf.ConfVars.HIVEMAPREDMODE))) {  
  if(!hasColumnExpr(prunerExpr)) {  
    thrownew SemanticException(ErrorMsg.NO_PARTITION_PREDICATE  
        .getMsg("for Alias \"" + alias + "\" Table \"" 
            + tab.getTableName() + "\"")); 
  } 
}

这三种case在数据量比较大的情况下都会造成生成低效的MR Job，影响执行时间和效率，不过直接抛出exception又感觉太forcefully了。

可以在一些非线上生产环境下的ad-hoc查询端中开启strict mode，比如hiveweb，运营工具。

0 0