hive内置UDTF函数

来源:互联网 发布:淘宝开店认证是什么 编辑:程序博客网 时间:2024/04/28 07:22
  • explode函数:hive内置的表生成函数,主要用于将一行输入拆分成多行输出。
官网解释:explode() takes in an array (or a map) as an input and outputs the elements of the array (map) as separate rows. UDTFs can be used in the SELECT expression list and as a part of LATERAL VIEW.
①创建array数据类型表
hive> create table student_array(    > name string,    > course_score array<string>    > )    > row format delimited fields terminated by '\t'    > collection items terminated by ','    > stored as textfile;
②准备数据student_array.txt
root@Master:/home/hadoop/dataset# cat student_array.txtzs语文:86,数学:87.5,英语:90ls语文:76,数学:93,英语:88ww语文:88,数学:90,英语:95
③将数据导入到hive表
hive> load data local inpath '/home/hadoop/dataset/student_array.txt' overwrite into table student_array;
④查看hive表数据
hive> select * from student_array;-chgrp: '*' does not match expected pattern for groupUsage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...OKzs["语文:86","数学:87.5","英语:90"]ls["语文:76","数学:93","英语:88"]ww["语文:88","数学:90","英语:95"]
对于array数据类型,explode函数将数组中的元素拆分,按行输出每个元素
hive> select explode(course_score) from student_array;-chgrp: '*' does not match expected pattern for groupUsage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...OK语文:86数学:87.5英语:90语文:76数学:93英语:88语文:88数学:90英语:95
posexplode函数类似于explode函数,只是在返回数组中的元素同时也返回元素在数组中的索引位置
hive> select posexplode(course_score)   from  student_array;-chgrp: '*' does not match expected pattern for groupUsage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...OK0语文:861数学:87.52英语:900语文:761数学:932英语:880语文:881数学:902英语:95
对于map数据类型,explode函数会将一个map数据类型拆分成两列输出,键一列,值一列。
首先创建带map的表
hive> create table student_map(    > name string,    > course_score map<string ,float>    > )    > row format delimited fields terminated by '\t'    > collection items terminated by ','    > map keys terminated by ':'    > stored as textfile;hive> select * from student_map;-chgrp: '*' does not match expected pattern for groupUsage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...OKzs{"语文":86.0,"数学":87.5,"英语":90.0}ls{"语文":76.0,"数学":93.0,"英语":88.0}ww{"语文":88.0,"数学":90.0,"英语":95.0}
explode函数将map数据类型的数据键值对拆分长两列
hive> select explode(course_score) from student_map;-chgrp: '*' does not match expected pattern for groupUsage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...OK语文86.0数学87.5英语90.0语文76.0数学93.0英语88.0语文88.0数学90.0英语95.0
posexplode函数的参数只能是数组,不能是map数据类型
hive> select posexplode(course_score) from student_map;FAILED: UDFArgumentException posexplode() takes an array as a parameter
对于结构体数组,可以使用inline函数进行拆分
hive> select * from student_struct_array;-chgrp: '*' does not match expected pattern for groupUsage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...OKzs[{"course":"语文","score":86.0},{"course":"数学","score":87.5},{"course":"英语","score":90.0}]ls[{"course":"语文","score":76.0},{"course":"数学","score":93.0}]ww[{"course":"语文","score":88.0},{"course":"数学","score":90.0},{"course":"英语","score":95.0}]Time taken: 0.19 seconds, Fetched: 3 row(s)hive> select inline(course_score) from student_struct_array;-chgrp: '*' does not match expected pattern for groupUsage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...OK语文86.0数学87.5英语90.0语文76.0数学93.0语文88.0数学90.0英语95.0
在select语句中使用UDTF语法的限制:
①在select语句中只支持单个UDTF表达式
②不支持UDTF嵌套
③不支持GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY语句
hive> select explode(name) , explode(course_score) from student_array;FAILED: SemanticException 1:23 Only a single expression in the SELECT clause is supported with UDTF's. Error encountered near token 'course_score'hive> select explode(name) , explode(course_score) from student_array;FAILED: SemanticException 1:23 Only a single expression in the SELECT clause is supported with UDTF's. Error encountered near token 'course_score'hive> select name ,explode(course_score) from student_array;FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressionshive> select explode(course_score) from student_array sort by name;FAILED: SemanticException [Error 10078]: SORT BY is not supported with a UDTF in the SELECT clause
explode函数和lateral view结合使用,就不存在上上语法限制
hive> select name ,course ,score from student_struct_array  lateral view inline(course_score)  course_scores as course , score ;-chgrp: '*' does not match expected pattern for groupUsage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...OKzs语文86.0zs数学87.5zs英语90.0ls语文76.0ls数学93.0ww语文88.0ww数学90.0ww英语95.0
lateral view outer 使用方法
hive> select * from student_array lateral view outer explode(array())  c as a;-chgrp: '*' does not match expected pattern for groupUsage: hadoop fs [generic options] -chgrp [-R] GROUP PATH...OKzs["语文:86","数学:87.5","英语:90"]NULLls["语文:76","数学:93","英语:88"]NULLww["语文:88","数学:90","英语:95"]NULLTime taken: 0.166 seconds, Fetched: 3 row(s)









原创粉丝点击