Hive建立外部表与时间戳转换(含建dual表,修改列名,row_number() 函数等)
来源:互联网 发布:浙江华通云数据上市 编辑:程序博客网 时间:2024/06/05 21:30
建外部表,之前hadoop职位统计就是这么做的
hive> drop table job;OKTime taken: 5.446 secondshive> show tables;OKTime taken: 0.018 secondshive> create external table job(area string, experience string, degree string, > num string, salary string) > row format delimited fields terminated by ',' > location '/job';#在hdfs先建好这个文件夹,并上传好数据OKTime taken: 0.103 secondshive> select * from job;OK北京3-5年本科3人 15001-20000元/月北京1-3年本科3人 10001-15000元/月杭州3-5年本科1人 15001-20000元/月 。。。hive> select area, count(area) from job group by area;
建表
hive> create external table tctest(uid string,goodsid string,behtype string,space string,category string,time string)hive> row format delimited fields terminated by ','hive> location '/tianchitest';
将日期转化为时间戳
hive> select unix_timestamp(time,'yyyyMMddHH') from tctest;利用上面的函数+CTAS语句创建一个含时间戳的表
hive> create table tcc as select uid,goodsid,behtype,space,category,unix_timestamp(time,'yyyy-MM-dd HH') from tctest;
最好直接改列名
hive> create table tcc as select uid,goodsid,behtype,space,category,unix_timestamp(time,'yyyy-MM-dd HH') as time from tctest;
dual表
因为Hive里没有自带dual表,所以我们要自己建一个
hive> create table dual(dummy string);#之后建一个dual.txt里面写个值Xhive> load data local inpath '/home/guo/dual.txt' into table dual;hive> select unix_timestamp("2014-12-18 23:59:59") from dual;OK1418918399Time taken: 0.082 seconds, Fetched: 1 row(s)hive> select unix_timestamp("2014-12-17 23:59:59") from dual;OK1418831999Time taken: 0.112 seconds, Fetched: 1 row(s)
将时间戳转化为日期
hive> select from_unixtime(1418831999) from dual;OK2014-12-17 23:59:59Time taken: 0.111 seconds, Fetched: 1 row(s)
查看表结构
hive> describe tcc;OKuid string goodsid string behtype string space string category string _c5 bigint Time taken: 0.025 seconds, Fetched: 6 row(s)修改列名,注意那是反引号
hive> alter table tcc change `_c5` time bigint;OKTime taken: 0.172 seconds
将hive表中内容导到本地
guo@guo:~$ hive -e "select * from tcpredict" >> predict.csv
开始的想法奉上(当时没有理解题意,想的也比较简单)
hive> create external table tclx(uid string,goodsid string,behtype string,space string,category string,time string)hive> row format delimited fields terminated by ','hive> location '/tianchilx';hive> create table tianchi as select uid,goodsid,behtype,space,category,unix_timestamp(time,'yyyy-MM-dd HH') as time from tclx;#改列名,不用了hive> alter table tianchi change `_c5` time bigint;
0.0
hive> create table tct1 as select distinct uid from tianchi where behtype = 3 and time > 1418831999;hive> create table tct2 as select t.* from tianchi t,tct1 c where t.uid=c.uid;hive> select * from tct2 limit 5;#查看表的前五行数据hive> create table tct3 as select distinct uid from tct2 where behtype=4; hive> create table tct4 as select t.* from tct2 t,tct3 c where t.uid=c.uid;
改进一下
hive> create table tct6 as select c.* from (select distinct uid from tct2 where behtype=4)t,tct2 c where t.uid=c.uid;
或者,更直接一点
hive> create table tct7 as select c.* from (select distinct uid from (select a.* from tianchi a,(select distinct uid from tianchi where behtype=3 and time>1418831999) b where a.uid=b.uid)t where t.behtype=4)t,tct2 c where t.uid=c.uid
或者,改进一点,用in
hive> create table tct8 as select c.* from (select distinct uid from (select a.* from tianchi a where a.uid in (select distinct uid from tianchi where behtype=3 and time>1418831999))b where b.behtype=4)t,tct2 c where t.uid=c.uid;再改进一点
hive> create table tct9 as select c.* from tct2 c where c.uid in (select distinct uid from (select a.* from tianchi a where a.uid in (select distinct uid from tianchi where behtype=3 and time>1418831999))b where b.behtype=4);
下面来自:http://jingyan.baidu.com/article/9989c74604a644f648ecfef3.html
简单的说row_number()从1开始,为每一条分组记录返回一个数字,这里的ROW_NUMBER() OVER (ORDER BY xlh DESC) 是先把xlh列降序,再为降序以后的没条xlh记录返回一个序号。
SELECT *, Row_Number() OVER (partition by deptid ORDER BY salary desc) rank FROM employee
根据部门分组,显示每个部门的工资等级
0 0
- Hive建立外部表与时间戳转换(含建dual表,修改列名,row_number() 函数等)
- 修改名字(表名,列明等)
- 横竖表转换和列名修改
- mysql sql 修改表名 建立外键 修改列名 删除列
- mysql sql 修改表名 建立外键 修改列名 删除列
- Hive修改表名
- Hive学习 第五课 修改表名修改列名添加列并删除或替换列。
- hive的dual表
- Hive创建dual表
- sql 查询表的列名 row_number() DENSE_RANK()
- hive 外部表不支持添加列
- oracle 修改表名、列名、字段类型、添加表列、删除表列 等相关操作
- SYBASE修改表的列名(字段名)
- sql 修改列名及表名
- SQL 修改表名及列名
- sqlce 修改表名 列名
- exce sp_rename 修改表名、列名
- 批量修改表名和列名
- javaScript树形结构
- 2016sdau课程练习专题一 1013 problemN
- Android ndk使用介绍
- iOS的cookie认证登陆
- js中for in 和 for each in的使用
- Hive建立外部表与时间戳转换(含建dual表,修改列名,row_number() 函数等)
- Bezier曲线原理
- 有趣的数
- 第五周项目23-静态成员应用
- 归并排序
- Android开发:GPU过度绘制的优化
- iOS开发的一些小技巧
- OpenCV通过cvFindContours与cvDrawCountours函数查找轮廓
- 文章标题