Hive I
来源:互联网 发布:java登录界面与数据库 编辑:程序博客网 时间:2024/05/16 08:16
数据仓库
OLAP //online analyze process. //数量量大,并发低,延迟高。hive //hadoop mr,效率高。sql //类似sql语句。
数据库
mysql,OLTP //在线事务处理。acid //事务并发现象: dirty read | unrepeatable read | phantom read //隔离级别:read uncommitted | read committed | repeatable read | serializable //延迟低.
hive
蜜蜂。数据仓库。使用sql方式读、写、管理驻留在分布式存储系统上的大型数据集的数据仓库软件。命令行方式和driver方式。最初由facebook开发,处理结构化数据。[不是] 关系型数据库 OLTP 实时查询和行级更新。[特点] 存储schema文件在数据库中,处理的是hdfs数据。 OLAP 提供SQL语句,成为HiveQL / HQL 可伸缩、可扩展
hive概念
1.user interface shell web ui2.metastore 元数据库 存储库、表、列、类型等信息。 存放在关系数据库中。3.HiveQL prcoess engine 处理引擎,替代传统MR方式。4.执行引擎 处理查询,生成MR的结果。5.hdfs | hbase 存储存放地。
hive与hadoop的交互流程
client同hive driver(select id,name,age from customers ;) -->通过编译器进行编译 -->查询metastore -->执行引擎 -->hadoop mr
hive安装
1.下载2.tar $>tar -xzvf apache-hive-2.1.0-bin.tar.gz -C /soft/ $>ln -s apache-hive-2.1.0-bin hive3.环境变量 $>sudo nano /etc/profile ... HIVE_HOME=/soft/hive PATH=$PATH:$HIVE_HOME/bin4.source /etc/profile
hive配置元数据到mysql中,不使用默认的derby库。
1.复制mysql驱动到hive的lib下. cp mysql-connector-java-5.1.17.jar /soft/hive/lib2.创建hive配置文件 [/soft/hive/conf/hive-site.xml]
<property> <name>javax.jdo.option.ConnectionPassword</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.231.1:3306/hive6</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>hive.server2.authentication</name> <value>NONE</value> </property> <property> <name>hive.server2.enable.doAs</name> <value>false</value> </property>
初始化hive元数据到mysql
$>bin/schemaTool -dbType mysql -initSchema
hive的组件
database //hdfs目录,/user/hive/warehouse/${dbname}.dbtable //hdfs目录,/user/hive/warehouse/${dbname}.db/${table}
进入shell
//进入hive命令行$>hive//hive操作$hive>create database mydb ; //建库$hive>use mydb ;$hive>create table customers(id int,name string , age int) ; //建表$hive>show database ;$hive>show tables ;$hive>desc customers ;
使用hive的hiveserver2服务实现hive cs访问
1.启动hiveserver2 $>hive/bin/hiveserver2 &2.查看hiveserver2是否启动完成 netstat -anop | grep 100003.启动beeline命令行 $>hive/bin/beeline $beeline>!help $beeline>!connect jdbc:hive2://localhost:10000/mydb ; $beeline>show databases ;
远程通过jdbc连接访问hiveserver2,进行hive数据仓库操作
1.创建模块添加依赖
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.it18zhang</groupId> <artifactId>my-hive-day01</artifactId> <version>1.0-SNAPSHOT</version> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>2.1.0</version> </dependency> </dependencies> </project>
2.编写测试代码
package com.hive.test; import org.junit.Test; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.Statement; /** * 测试hive */ public class TestHive { @Test public void getConn() throws Exception { String driver= "org.apache.hive.jdbc.HiveDriver" ; Class.forName(driver) ; Connection conn = DriverManager.getConnection("jdbc:hive2://s100:10000/mydb") ; Statement st = conn.createStatement(); ResultSet rs = st.executeQuery("select id,name,age from customers order by id desc"); while(rs.next()){ int id = rs.getInt(1) ; String name = rs.getString(2) ; int age = rs.getInt(3) ; System.out.println(id + " ," + name + "," + age); } rs.close(); conn.close(); } }
3.执行并查看web ui mr.
hive数据类型
1.基本数据类型 TINYINT //byte 1 SMALLINT //short 2 INT //int 4 BIGINT //long 8 FLOAT //float 4 DOUBLE //double 8 DECIMAL //decimal 精度和刻度decimal(10,3) BINARY //二进制 BOOLEAN //TRUE | FALSE STRING //字符串 CHAR //定长 <= 255 VARCHAR //变长 <=65355. DATE //日期 '2013-01-01' TIMESTAMP //时间戳 ‘2013-01-01 12:00:01.345’2.复杂类型 ARRAY //数组 ['apple','orange','mango'] MAP //map {1:"apple",2: "orange"} STRUCT //结构体 {1, "apple"} NAMED STRUCT //命名结构体{"apple":"gala","weightkg":1} UNION //组合 {2:["apple","orange"]}3.准备数据
Michael|Montreal,Toronto|Male,30|DB:80|Product:Developer Lead
Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
Lucy|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead
4.创建复杂表 CREATE TABLE employee ( name string, arr ARRAY<string>, struc STRUCT<sex:string,age:int>, map1 MAP<string,int>, map2 MAP<string,ARRAY<string>> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' COLLECTION ITEMS TERMINATED BY ',' MAP KEYS TERMINATED BY ':' LINES TERMINATED BY '\n' STORED AS TEXTFILE;5.加载本地数据到hive $hive>load data local inpath '/home/centos/employee.txt' into table employee ; // $hive>select * from employee ; //重命名 $hive>alter table employee rename to emp; $hive>select name,arr from emp ; //查询数组指定元素 $hive>select name,arr[0] from emp ; //查询结构体 $hive>select name,struc.sex from emp ; //查询map指定的key $hive>select name,map1["DB"] from emp ; //查询map2指定的k $hive>select map2["Product"][0] from emp ;6.使用hive实现word count 6.1)创建表 CREATE TABLE docs ( line string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ; 6.2)加载数据到docs load data local inpath '/home/centos/1.txt' into table docs ; 6.3)单词统计 //单词统计 select t.word,count(*) from (select explode(split(line," ")) word from docs) t group by t.word ; //降序topN查询 select t.word,count(*) cnt from (select explode(split(line," ")) word from docs) t group by t.word order by cnt desc limit 3 ;
创建表完整语法
CREATE TABLE employee(name string,arr ARRAY<string>,struc STRUCT<sex:string,age:int>,map1 MAP<string,int>,map2 MAP<string,ARRAY<string>>)ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' //默认\001 COLLECTION ITEMS TERMINATED BY ',' //默认\002 MAP KEYS TERMINATED BY ':' //默认\003 LINES TERMINATED BY '\n' //行结束符 STORED AS TEXTFILE; //
explode
UDTF,表生成函数。可以应用于array或者map.
类型转换函数
cast('124' as int); //转换成整数concat('12','12','23','35') //字符串连接函数
DDL
create database if not exists xx ; //创建数据库定义位置和属性,以及注释。
CREATE DATABASE IF NOT EXISTS myhivebook COMMENT 'hive database demo' LOCATION '/hdfs/directory' WITH DBPROPERTIES ('creator'='dayongd','date'='2015-01-01')
//查看库信息 desc database myhivebook ; //删除库 drop database myhivebook ;
load data
//1.从本地加载,复制过程 load data local inpath '/x/x/x/x/1.xt' into table docs ; //2.从hdfs加载,移动过程 load data inpath '/x/x/x/x/1.xt' into table docs ;
内部表和外部表
1.内部表 托管表,删除表时,数据和表结构都删除,默认内部表。2.外部表 删除时,只删除表结构。数据还在。
CTAS:Create the table as select
//携带数据 create table emp2 as select name from emp ;
like语句建表,只有数据结构,没有数据
create table emp3 like emp ;
truncate
快速请空表。 truncate table emp2 ;
分区表:分区是目录。
//创建分区表
CREATE TABLE custs ( id int, name string , age int ) PARTITIONED BY (prov string, city string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
//添加分区 alter table custs add PARTITION (prov='hebei', city='baoding') PARTITION (prov='hebei', city='shijiazhuang'); //查看分区 SHOW PARTITIONS custs; //删除分区 alter table custs drop partition partition(prov='hebei',city='shijizhuang') ; //加载数据到分区 load data local inpath '/home/centos/cust.txt' into table custs partition(prov='hebei',city='baoding') ; //按照分区查询 select * from custs where city = 'baoding' ;
桶表:桶表是文件。
//创建桶表
CREATE TABLE buck ( id int, name string , age int ) CLUSTERED BY (id) INTO 3 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
//桶的数量确定标准避免桶内的数据量过大或者过小,一般以数据块的2倍为宜。//设置map个数和强行分桶set map.reduce.tasks = 2;set hive.enforce.bucketing = true;
连接
//创建orders表。
CREATE TABLE orders ( id int, orderno string , price float, cid int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' STORED AS TEXTFILE;
//笛卡尔积 select a.*,b.* from custs a , orders b ; //内连接 select a.*,b.* from custs a inner join orders b on a.id = b.cid ; //左外连接 select a.*,b.* from custs a left outer join orders b on a.id = b.cid ; //半连接,只查询左边的表 select a.* from custs a left outer join orders b on a.id = b.cid ; //右外 select a.*,b.* from custs a right outer join orders b on a.id = b.cid ; //全外 select a.*,b.* from custs a full outer join orders b on a.id = b.cid ;
Map端连接
//连接暗示/*+ MAPJOIN(employee) */ SELECT /*+ MAPJOIN(employee) */ c.* FROM custs c CROSS JOIN orders o WHERE c.id <> o.cid; //通过设置自动map端连接转换,实现map连接 set hive.auto.convert.join=true SELECT c.* FROM custs c CROSS JOIN orders o WHERE c.id <> o.cid;
union查询
//union all select id,name from custs union select id,orderno from orders ;//去重select distinct cid from orders ;
阅读全文
0 0
- Hive I
- Hive 之 -e -f -i
- hive cli 常用操作(-e,-f.-i)
- hive用法-f-e-S-i
- Hive
- HIVE
- Hive
- hive
- hive
- hive
- hive
- hive
- Hive
- hive
- Hive
- hive
- Hive
- Hive
- 对于数组的初始赋值
- hibernate 关联关系(多对一、多对多、一对一)的配置
- 简单搜索 D题
- form表单提交路径
- C#编程(1)实例类成员变量,静态类成员变量,局部函数变量
- Hive I
- Code Force 825D Suitable Replacement
- JVM虚拟机类加载机制
- Android之解决ubuntu没有无线网卡和手机wifi实现adb wifi调试
- JAVA学习笔记-----网络
- poj 3006 Dirichlet's Theorem on Arithmetic Progressions
- 【模板】树状数组2
- sqlserver2008根据事务日志还原数据库
- 小书匠语法使用手册