【hadoop】16、学习hive操作语句
来源:互联网 发布:天津广电网络网上缴费 编辑:程序博客网 时间:2024/06/05 20:47
学习DDL语句
创建对象的语句
Create/Drop/Alter Database
Create Database
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name [COMMENT database_comment] [LOCATION hdfs_path] [WITH DBPROPERTIES (property_name=property_value, ...)];
Drop Database
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
Alter Database
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...); ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;
Use Database
USE database_name;
USE DEFAULT;
Hive运行的时候,元数据存储在关系系数据库里面。
Hive运行的时候需要有映射关系的数据,需要快速地读取
Linux里面其实有自带的关系数据库,但是十分不稳定,所以我们不用这个数据库
我们自己搭建一个关系数据库
安装一个关系数据库(mysql)
我们在安装Linux的时候已经安装了mysql
启动mysql
查看mysql是否已经进行监听
3306端口,对的
连接mysql
受限我们需要驱动
设置mysql中远程登录的问题
输入use mysql
select * from user;
grant all on . to root@’%’ identified by ‘123456’;
这个是给所有的用户在所有的数据库上的所有的表的所有权限,密码是123456
查看一下是否成功
修改配置文件
配置mysql路径
修改用户名和密码
我们创建一个hive的数据库
进入hive
启动之后推出hive
Quite;
然后在mysql中查看表
退出
学习hive的DDL语句
Create Table
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later) [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)] ON ((col_value, col_value, ...), (col_value, col_value, ...), ...) [STORED AS DIRECTORIES] [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later) ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later) [AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE existing_table_or_view_name [LOCATION hdfs_path];data_type : primitive_type | array_type | map_type | struct_type | union_type -- (Note: Available in Hive 0.7.0 and later)primitive_type : TINYINT | SMALLINT | INT | BIGINT | BOOLEAN | FLOAT | DOUBLE | STRING | BINARY -- (Note: Available in Hive 0.8.0 and later) | TIMESTAMP -- (Note: Available in Hive 0.8.0 and later) | DECIMAL -- (Note: Available in Hive 0.11.0 and later) | DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later) | DATE -- (Note: Available in Hive 0.12.0 and later) | VARCHAR -- (Note: Available in Hive 0.12.0 and later) | CHAR -- (Note: Available in Hive 0.13.0 and later)array_type : ARRAY < data_type >map_type : MAP < primitive_type, data_type >struct_type : STRUCT < col_name : data_type [COMMENT col_comment], ...>union_type : UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)row_format : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] [NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later) | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]file_format: : SEQUENCEFILE | TEXTFILE -- (Default, depending on hive.default.fileformat configuration) | RCFILE -- (Note: Available in Hive 0.6.0 and later) | ORC -- (Note: Available in Hive 0.11.0 and later) | PARQUET -- (Note: Available in Hive 0.13.0 and later) | AVRO -- (Note: Available in Hive 0.14.0 and later) | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
例子:
id int,date date,name varcharcreate table table_name ( id int, dtDontQuery string, name string)partitioned by (date string)
一个例子
CREATE TABLE page_view(viewTime INT, userid BIGINT,page_url STRING, referrer_url STRING,ip STRING COMMENT 'IP Address of the User')COMMENT 'This is the page view table'PARTITIONED BY(dt STRING, country STRING)ROW FORMAT DELIMITEDFIELDS TERMINATED BY '\001' 这个是分隔符,行的每一列用什么分割STORED AS SEQUENCEFILE;
我们创建一张表
在hive中
create table t_emp(id int,name string,age int,dept_name string)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ',';
我们在Linux中建立一个文本的数据文件
Emp.txt
导入数据
Loading files into tables
Hive does not do any transformation while loading data into tables. Load operations are currently pure copy/move operations that move datafiles into locations corresponding to Hive tables.
LOAD DATA [LOCAL] INPATH ‘filepath’ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 …)]
Hive通过我们的环境变量找到hadoop在哪,然后连上hadoop,就会创建hive的工作目录在hdfs上,在user下的hive下
我们查询,在hive下面
select count(*) from t_emp;
Hive还可以使用各种集合类型
create table t_person(id int,name string,like array<string>,tedian map<string, string>)row format delimitedfields terminated by ','collection items terminated by '_'map keys terminated by ':';
数据格式
1,,zhangsan,sports_books_TV,sex:男_color:red
加载文件
Load data local inpath ‘root/data.exe’ into table t_person
Hive在运行的时候有一些元数据需要保存。默认保持到DBMS。
学习DML语句
导入数据
Loading files into tables
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
创建分区表
分区实际是一个文件夹,表名就是文件夹名。每个分区,实际上是表名这个文件夹下面的不同文件。分区可以根据时间,地点等等进行分区,比如,每天一个分区,等于每天存每天的数据,或者每个城市,存放每个城市的数据。每次查询数据的时候,只要写下类似where pt=2010_08_23这样的条件即可查询指定时间的数据
Create table sxtstu(id int, sname string, city string)
Partitioned by (ds string) row format delimited fields terminated by ‘,’ stored as textfile;
我们保存数据的时候
Load data local inpath ‘sxtstu.txt’ overwrite into table sxtstu partition(ds=’2013-07-09’);
Copying data from file:/home/Hadoop/sxtstu.txt
Copying file:file:/home/Hadoop/sxtstu.txt
Loading data to table default.sxtstu partition (ds=2013-07-09)
OK
我们尝试创建一张表
create table dept_count( dname string, num int) ;insert into table dept_count select dept_name, count(1) from t_emp group by dept_name;
关于分区:
Create table dept_count(Num int)Partitioned by (dname string);
Insert into table dept_count partition (dname='销售部') select count(1) from t_emp where dept_name='销售部' group by dept_name
一些案例:
CREATE TABLE students (name VARCHAR(64), age INT, gpa DECIMAL(3, 2)) CLUSTERED BY (age) INTO 2 BUCKETS STORED AS ORC;INSERT INTO TABLE students VALUES ('fred flintstone', 35, 1.28), ('barney rubble', 32, 2.32);CREATE TABLE pageviews (userid VARCHAR(64), link STRING, came_from STRING) PARTITIONED BY (datestamp STRING) CLUSTERED BY (userid) INTO 256 BUCKETS STORED AS ORC;INSERT INTO TABLE pageviews PARTITION (datestamp = '2014-09-23') VALUES ('jsmith', 'mail.com', 'sports.com'), ('jdoe', 'mail.com', null);INSERT INTO TABLE pageviews PARTITION (datestamp) VALUES ('tjohnson', 'sports.com', 'finance.com', '2014-09-23'), ('tlee', 'finance.com', null, '2014-09-21');
关于import和export
EXPORT TABLE tablename [PARTITION (part_column="value"[, ...])] TO 'export_target_path'IMPORT [[EXTERNAL] TABLE new_or_original_tablename [PARTITION (part_column="value"[, ...])]] FROM 'source_path' [LOCATION 'import_target_path']
导出语句
学习数据查询语句
类似SQL语句
create table t_stu(userid int,name string,age int,sex int,classid int)row format delimited fields terminated by ','stored as textfile;create table t_class(cid int,name string,teacher string)row format delimited fields terminated by ','stored as textfile;load data inpath '/pub/student.txt' into table t_stu;
1,zs,32,2,2
2,lis,23,1,2
3,ww,21,1,1
select s.*, c.name from t_stu s join t_class c on s.classid=c.cid;
- 【hadoop】16、学习hive操作语句
- hive和hadoop的dos操作语句
- hadoop学习笔记-hive安装及操作
- Hadoop学习笔记之操作hive
- hadoop学习之-hive-数据操作
- hadoop学习笔记--11.hive DDL操作
- hadoop学习笔记--12.hive DML操作
- hadoop命令及hive数据库操作语句的简单使用
- hadoop hive 操作整理
- hadoop hive基本操作
- Hadoop Hive sql操作
- hive相关操作语句
- Hive 操作语句...
- Hive基本操作语句
- Hadoop之hive学习
- Hadoop之hive学习
- Hadoop学习记录-Hive
- hadoop学习--hive
- adb shell命令、logcat、fastboot
- 织梦dedecms|文章列表标签arclist
- git stash 命令
- Git基础教程
- java 面试
- 【hadoop】16、学习hive操作语句
- 设计模式-中介者模式
- 管理理念:星巴克-文化成就品牌传奇
- 每天进步一点点——linux——df
- java 并行框架 并行编程
- myeclispe,http404
- SAT数学:几何专业词汇一览
- log4j的使用--java如何使用日志
- 串口UART