hive系列(3)---Transform

来源:互联网 发布:dd linux sync 编辑:程序博客网 时间:2024/05/22 09:05

,通过(2)可以看到自定义的udf使用起来还是不是很方便的,Hive的 TRANSFORM 关键字提供了在SQL中调用自写脚本的功能
适合实现Hive中没有的功能又不想写UDF的情况。

一、文件的采集

json文件的上传
1、建表

hive> create table t_json(line string)    > row format delimited;

2、上传文件

load data local inpath '/root/rating.json' into table t_json;

3、json数据串对应的表

create table t_rate(movieid string,rate int,time string, uid string)row format delimitedfields terminated by '\t';

4、json数据解析的导入
(复杂的json自己可以写udf解析(或者transform通过脚本))

line对应的是原始json表的字段)hive> insert overwrite table t_rate    > select get_json_object(line,'$.movie') moveid,get_json_object(line,'$.rate') rate,get_json_object(line,'$.timeStamp') time,    > get_json_object(line,'$.uid') uid from t_json;

5、查看数据:

hive> select * from t_rate limit 10;OK1193    5       978300760       1661     3       978302109       1914     3       978301968       13408    4       978300275       12355    5       978824291       11197    3       978302268       11287    5       978302039       12804    5       978300719       1594     4       978302268       1919     4       978301368       1

二、transform

1、表创建

CREATE TABLE u_data_new (  movieid INT,  rating INT,  weekday INT,   userid INT)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '\t';

2、t_rate表的记录条数

+----------+--+|   _c0    |+----------+--+| 1000209  |+----------+--+

3、weeday_mapper_new.py脚本

#!/bin/pythonimport sysimport datetimefor line in sys.stdin:  line = line.strip()  movieid, rating, unixtime,userid = line.split('\t')  weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()  print '\t'.join([movieid, rating, str(weekday),userid])

4、添加脚本到hive系统路径:

add file /root/weeday_mapper_new.py;

5、日期处理后的表

insert overwrite table u_data_new select TRANSFORM(movieid,rate,time,uid)USING 'python weeday_mapper_new.py'as(movieid,rate,weekday,uid)from t_rate;
select * from u_data_new limit 20;+---------------------+--------------------+---------------------+--------------------+--+| u_data_new.movieid  | u_data_new.rating  | u_data_new.weekday  | u_data_new.userid  |+---------------------+--------------------+---------------------+--------------------+--+| 1193                | 5                  | 1                   | 1                  || 661                 | 3                  | 1                   | 1                  || 914                 | 3                  | 1                   | 1                  || 3408                | 4                  | 1                   | 1                  || 2355                | 5                  | 7                   | 1                  || 1197                | 3                  | 1                   | 1                  || 1287                | 5                  | 1                   | 1                  || 2804                | 5                  | 1                   | 1                  || 594                 | 4                  | 1                   | 1                  || 919                 | 4                  | 1                   | 1                  || 595                 | 5                  | 7                   | 1                  || 938                 | 4                  | 1                   | 1                  || 2398                | 4                  | 1                   | 1                  || 2918                | 4                  | 1                   | 1                  || 1035                | 5                  | 1                   | 1                  || 2791                | 4                  | 1                   | 1                  || 2687                | 3                  | 7                   | 1                  || 2018                | 4                  | 1                   | 1                  || 3105                | 5                  | 1                   | 1                  || 2797                | 4                  | 1                   | 1                  |+---------------------+--------------------+---------------------+--------------------+--+
0 0