•UDTF有两种使用方法,一种直接放到select后面,一种和lateral view一起使用
•直接select中使用:select explode_map(properties) as (col1,col2) from src;
•不可以添加其他字段使用:select a, explode_map(properties) as (col1,col2) from src
•不可以嵌套调用:select explode_map(explode_map(properties)) from src
•不可以和group by/cluster by/distribute by/sort by一起使用:select explode_map(properties) as (col1,col2) from src group by col1, col2
•和lateral view一起使用:select src.id, mytable.col1, mytable.col2 from src lateral view explode_map(properties) mytable as col1, col2;
此方法更为方便日常使用。执行过程相当于单独执行了两次抽取,然后union到一个表里。
lateral view
• Lateral View语法
•lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)* fromClause: FROM baseTable (lateralView)*
•Lateral View用于UDTF(user-defined table generating functions)中将行转成列,例如explode().
•目前Lateral View不支持有上而下的优化。如果使用Where子句,查询可能将不被编译。解决方法见:
此时,在查询之前执行set hive.optimize.ppd=false;
• 例子
•pageAds。它有两个列
string pageid
Array<int> adid_list
" front_page"
[1, 2, 3]
"contact_page "
[ 3, 4, 5]
•SELECT pageid, adid FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
•将输出如下结果
string pageid int adid
"front_page" 1
…….
“contact_page" 3
代码示例
public class MyUDTF extends GenericUDTF{
public StructObjectInspector initialize(ObjectInspector[] args) {}
public void process(Object[] args) throws HiveException { }
}
实现:切分 ” key:value;key:value;” 这种字符串, 返回结果为 key, value 两个字段
转载请注明出处【 http://sishuok.com/forum/blogPost/list/6226.html】