hive基础操作
来源:互联网 发布:紫胤真人知乎 编辑:程序博客网 时间:2024/05/29 07:34
启动dfs和yarn
start-dfs.shstart-yarn.sh
hive操作
hivehive>show databases;defaulthive>create database test;hive>show databases;defaulttesthive>use test;
内部表
drop表时存储在hdfs上的文件也会被删除
示例:学生表student,选修关系表sc,课程表course
hive>create table student(Sno int,Sname string,Sex string,Sage int,Sdept string)row format delimited fields terminated by ','stored as textfile;hive>create table course(Cno int,Cname string) row format delimited fields terminated by ',' stored as textfile;hive>create table sc(Sno int,Cno int,Grade int)row format delimited fields terminated by ',' stored as textfile;
创建的数据库保存在表DBS中
hdfs上的路径:
/user/hive/warehouse/test.db
其中/user/hive/warehouse在hive-site.xml文件中指定的
创建的表在TBLS中
hdfs上的路径:
/user/hive/warehouse/test.db/student
导入数据
创建文件
student.txt
95001,李勇,男,20,CS95002,刘晨,女,19,IS95003,王敏,女,22,MA95004,张立,男,19,IS95005,刘刚,男,18,MA95006,孙庆,男,23,CS95007,易思玲,女,19,MA95008,李娜,女,18,CS95009,梦圆圆,女,18,MA95010,孔小涛,男,19,CS95011,包小柏,男,18,MA95012,孙花,女,20,CS95013,冯伟,男,21,CS95014,王小丽,女,19,CS95015,王君,男,18,MA95016,钱国,男,21,MA95017,王风娟,女,18,IS95018,王一,女,19,IS95019,邢小丽,女,19,IS95020,赵钱,男,21,IS95021,周二,男,17,MA95022,郑明,男,20,MA
course.txt
1,数据库2,数学3,信息系统4,操作系统5,数据结构6,数据处理
sc.txt
95001,1,8195001,2,8595001,3,8895001,4,7095002,2,9095002,3,8095002,4,7195002,5,6095003,1,8295003,3,9095003,5,10095004,1,8095004,2,9295004,4,9195004,5,7095005,1,7095005,2,9295005,3,9995005,6,8795006,1,7295006,2,6295006,3,10095006,4,5995006,5,6095006,6,9895007,3,6895007,4,9195007,5,9495007,6,7895008,1,9895008,3,8995008,6,9195009,2,8195009,4,8995009,6,10095010,2,9895010,5,9095010,6,8095011,1,8195011,2,9195011,3,8195011,4,8695012,1,8195012,3,7895012,4,8595012,6,9895013,1,9895013,2,5895013,4,8895013,5,9395014,1,9195014,2,10095014,4,9895015,1,9195015,3,5995015,4,10095015,6,9595016,1,9295016,2,9995016,4,8295017,4,8295017,5,10095017,6,5895018,1,9595018,2,10095018,3,6795018,4,7895019,1,7795019,2,9095019,3,9195019,4,6795019,5,8795020,1,6695020,2,9995020,5,9395021,2,9395021,5,9195021,6,9995022,3,6995022,4,9395022,5,8295022,6,100
从本地导入数据
注意创建表结构是指定的分割符’,’与student.txt的行分割符’,’对应
hive>load data local inpath '/home/bingo/data/hive/students.txt' into table student;hive>load data local inpath '/home/bingo/data/hive/course.txt' overwrite into table course;hive>load data local inpath '/home/bingo/data/hive/sc.txt' overwrite into table sc;
基本操作:
查询全体学生的学号与姓名
select Sno,Sname from student;
查询选修了课程的学生姓名
select distinct Sname from student inner join sc on student.Sno=Sc.Sno;
—-hive的group by 和集合函数
查询学生的总人数
select count(distinct Sno) count from student;
计算1号课程的学生平均成绩
select avg(distinct Grade) from sc where Cno=1;
查询各科成绩平均分
select Cno,avg(Grade) from sc group by Cno;
查询选修1号课程的学生最高分数
select Grade from sc where Cno=1 sort by Grade desc limit 1;
比较:
select * from sc where Cno=1 sort by Grade
select Grade from sc where Cno=1 order by Grade
求各个课程号及相应的选课人数
select Cno,count(1) from sc group by Cno;
查询选修了3门以上的课程的学生学号
select Sno from (select Sno,count(Cno) CountCno from sc group by Sno)a where a.CountCno>3;或select Sno from sc group by Sno having count(Cno)>3;
—-hive的Order By/Sort By/Distribute By
Order By ,在strict 模式下(hive.mapred.mode=strict),order by 语句必须跟着limit语句,但是在nonstrict下就不是必须的,这样做的理由是必须有一个reduce对最终的结果进行排序,如果最后输出的行数过多,一个reduce需要花费很长的时间。只有一个reduce,如果数据量大,效率很低,甚至程序会崩掉
查询学生信息,结果按学号全局有序
set hive.mapred.mode=strict; <默认nonstrict>select Sno from student order by Sno;FAILED: Error in semantic analysis: 1:33 In strict mode, if ORDER BY is specified, LIMIT must also be specified. Error encountered near token 'Sno'
Sort By,它通常发生在每一个redcue里,“order by” 和“sort by”的区别在于,前者能给保证输出都是有顺序的,而后者如果有多个reduce的时候只是保证了输出的部分有序。set mapred.reduce.tasks=在sort by可以指定,在用sort by的时候,如果没有指定列,它会随机的分配到不同的reduce里去。distribute by 按照指定的字段对数据进行划分到不同的输出reduce中
此方法会根据性别划分到不同的reduce中 ,然后按年龄排序并输出到不同的文件中。
查询学生信息,按性别分区,在分区内按年龄有序
set mapred.reduce.tasks=2;insert overwrite local directory '/home/bingo/out' select * from student distribute by Sex sort by Sage;
—-Join查询,join只支持等值连接
查询每个学生及其选修课程的情况
select student.*,sc.* from student join sc on (student.Sno =sc.Sno);
查询学生的得分情况。
select student.Sname,course.Cname,sc.Grade from student join sc on student.Sno=sc.Sno join course on sc.cno=course.cno;
查询选修2号课程且成绩在90分以上的所有学生。
select student.Sname,sc.Grade from student join sc on student.Sno=sc.Sno where sc.Cno=2 and sc.Grade>90;
—-LEFT,RIGHT 和 FULL OUTER JOIN ,inner join, left semi join
查询所有学生的信息,如果在成绩表中有成绩,则输出成绩表中的课程号
select student.Sname,sc.Cno from student left outer join sc on student.Sno=sc.Sno;
如果student的sno值对应的sc在中没有值,则会输出student.Sname null.如果用right out join会保留右边的值,左边的为null。
Join 发生在WHERE 子句之前。如果你想限制 join 的输出,应该在 WHERE 子句中写过滤条件——或是在join 子句中写。
—-LEFT SEMI JOIN Hive 当前没有实现 IN/EXISTS 子查询,可以用 LEFT SEMI JOIN 重写子查询语句
重写以下子查询为LEFT SEMI JOIN
SELECT a.key, a.value FROM a WHERE a.key exist in (SELECT b.key FROM B);#可以被重写为: SELECT a.key, a.val FROM a LEFT SEMI JOIN b on (a.key = b.key)
查询与“刘晨”在同一个系学习的学生
select s1.Sname from student s1 left semi join student s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';
注意比较:
select * from student s1 left join student s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';select * from student s1 right join student s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';select * from student s1 inner join student s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';select * from student s1 left semi join student s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';select s1.Sname from student s1 right semi join student s2 on s1.Sdept=s2.Sdept and s2.Sname='刘晨';
外部表
drop表时hdfs上的文件不会被删除
hive>create external table student_ext(Sno int,Sname string,Sex string,Sage int,Sdept string)>row format delimited fields terminated by ','>location '/input/hive/student/';
location是hdfs上的路径
导入数据:
与内部表的导入一样
分区表
可以是内部表,也可以是外部表
hive> create external table student_p(Sno int,Sname string,Sex string,Sage int,Sdept string)>partitioned by(part string) >row format delimited fields terminated by ','>location '/input/hive/partition';#创建分区:alter table student_p add if not exists partition (part ='usa');#删除分区:外部表时hdfs上的文件还是存在的alter table student_p drop if exists partition (part ='japan');#导入数据时决定分区hive> load data local inpath '/home/bingo/data/hive/students.txt' overwrite into table student_p > partition (part='china');hive> load data local inpath '/home/bingo/data/hive/students.txt' overwrite into table student_p > partition (part='japan');#查询全部select * from student_p;#查询分区select * from student_p where part='china';
详情看官网:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual
- hive基础操作
- Hive基础操作
- hive基础操作
- HIVE(1)--基础操作。
- hive基础操作
- Hive SQL基础操作
- hive基础操作
- hive--基础操作
- hive字符串基础操作函数
- Hive编程(二)【基础操作】
- hive编程指南笔记--2基础操作
- Hive--基础操作、数据类型和文本文件格式
- Hadoop Hive基础SQL语法(DDL 操作)
- Hive的数据类型与基础操作
- hive基础
- Hive基础
- hive基础
- Hive基础
- 《ECMAScript 6 入门优化版》
- Java十四种开发工具及其特点
- WPF循环显示图片解决OutofMemoryException方法
- SGISTL源码探究-第一级配置器
- XHTML 相对路径与绝对路径
- hive基础操作
- HDU
- bzoj 3390: [Usaco2004 Dec]Bad Cowtractors牛的报复
- [JavaScript笔记] 数组去重 与 数组随机排序
- ftp服务端本地测试用ftp://localhost或者127.0.0.1都可以测试成功,但是一旦用本机IP就不行
- fiddler 参考网站
- UVA 196
- unity图像处理(下)
- 22:津津的储蓄计划