Sqoop进行MySQL和Hive间的导入导出操作
来源:互联网 发布:java equals string 编辑:程序博客网 时间:2024/05/16 01:22
- MySQL表数据导入到Hive中
- 1 MySQL建表
- 2 Sqoop创建Hive表
- 3 Sqoop导入数据到Hive
- Hive表数据导出到MySQL表中
- 1 MySQL建表
- 2 Sqoop导出数据到MySQL
- Sqoop的eval操作
- Sqoop的job操作
- 1 创建job
- 2 查看job集合
- 3 显示job详细信息
- 4 执行job
- 5 删除job
- Sqoop的codegen操作
- Sqoop将MySQL增量数据导入与合并操作
- 1 MySQL建表
- 2 MySQL现有数据导入到HDFS
- 3 MySQL插入增量数据
- 4 MySQL增量数据导入到HDFS
- 5 codegen生成jar
- 6 合并
- 7 合并后的数据导入到Hive表
1.Sqoop在MySQL的数据和Hive之间进行导入导出的操作
2.Sqoop的eval操作
3.Sqoop的job操作
4.Sqoop的codegen生成java代码
5.Sqoop将MySQL的增量数据进行导入与合并操作,并将合并后的数据导入到Hive中
Sqoop1.4.6的安装及命令使用参见 Sqoop1.4.6的安装及命令行使用
Hive的安装参见Hive2.1.1的安装
注:Hadoop集群需要先启动。
1.MySQL表数据导入到Hive中
1.1 MySQL建表
进入到mysql命令行,在mydb数据库下建立表’a’,并插入数据
mysql> use mydb;Database changedmysql> show tables;+----------------+| Tables_in_mydb |+----------------+| user || user2 |+----------------+2 rows in set (0.00 sec)mysql> create table a(id int,name varchar(50));Query OK, 0 rows affected (0.14 sec)mysql> insert into a values(1,'a1');Query OK, 1 row affected (0.13 sec)mysql> insert into a values(2,'a2');Query OK, 1 row affected (0.03 sec)mysql> select * from a;+------+------+| id | name |+------+------+| 1 | a1 || 2 | a2 |+------+------+2 rows in set (0.00 sec)
1.2 Sqoop创建Hive表
sqoop create-hive-table --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a --hive-table a --fields-terminated-by ',' --hive-overwrite
执行成功显示
进入到Hive命令行中,在default数据库中生成了表a,无数据
1.3 Sqoop导入数据到Hive
sqoop import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a --hive-import --fields-terminated-by ',' --hive-overwrite -m 1
导入成功
HDFS上查看导入结果
hadoop fs -cat /user/hive/warehouse/a/part-m-00000
Hive Shell里查看导入的数据
2. Hive表数据导出到MySQL表中
2.1 MySQL建表
在mydb数据库下建立表’b’
mysql> create table b(id int,name varchar(50));Query OK, 0 rows affected (0.07 sec)mysql> select * from b;Empty set (0.00 sec)
2.2 Sqoop导出数据到MySQL
将Hive仓库a文件夹数据导出到mysql中的表b
sqoop export --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table b --export-dir /user/hive/warehouse/a --input-fields-terminated-by ','
mysql数据库查看结果
mysql> select * from b;+------+------+| id | name |+------+------+| 2 | a2 || 1 | a1 |+------+------+2 rows in set (0.00 sec)
3. Sqoop的eval操作
sqoop eval连接mysql,直接进行select和DML操作
sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --query 'select * from a'
–query换成-e也行
sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root -e "insert into a values (3,'a3')"sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --query "insert into a values (4,'a4')"
[root@cyyun ~]# sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --e 'select * from a'16/12/23 20:35:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.616/12/23 20:35:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.16/12/23 20:35:50 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.--------------------------------------| id | name | --------------------------------------| 1 | a1 | | 2 | a2 | | 3 | a3 | | 4 | a4 | --------------------------------------
4. Sqoop的job操作
4.1 创建job
创建一个job,名为myjob,作用是将MySQL数据库中的表a数据导入到HDFS的/test/a_old目录下
sqoop job --create myjob -- import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root--password root --table a -m 1 --target-dir /test/a_old
4.2 查看job集合
sqoop job --list
4.3 显示job详细信息
[root@cyyun ~]# sqoop job --show myjob16/12/23 20:41:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6Enter password: Job: myjobTool: importOptions:----------------------------verbose = falsedb.connect.string = jdbc:mysql://192.168.1.97:3306/mydbcodegen.output.delimiters.escape = 0codegen.output.delimiters.enclose.required = falsecodegen.input.delimiters.field = 0hbase.create.table = falsedb.require.password = truehdfs.append.dir = falsedb.table = acodegen.input.delimiters.escape = 0import.fetch.size = nullaccumulo.create.table = falsecodegen.input.delimiters.enclose.required = falsedb.username = rootreset.onemapper = falsecodegen.output.delimiters.record = 10import.max.inline.lob.size = 16777216hbase.bulk.load.enabled = falsehcatalog.create.table = falsedb.clear.staging.table = falsecodegen.input.delimiters.record = 0enable.compression = falsehive.overwrite.table = falsehive.import = falsecodegen.input.delimiters.enclose = 0accumulo.batch.size = 10240000hive.drop.delims = falsecodegen.output.delimiters.enclose = 0hdfs.delete-target.dir = falsecodegen.output.dir = .codegen.auto.compile.dir = truerelaxed.isolation = falsemapreduce.num.mappers = 1accumulo.max.latency = 5000import.direct.split.size = 0codegen.output.delimiters.field = 44export.new.update = UpdateOnlyincremental.mode = Nonehdfs.file.format = TextFilecodegen.compile.dir = /tmp/sqoop-root/compile/1df5c52bab08ce1b6e9ae7c193b88264direct.import = falsehdfs.target.dir = /test/a_oldhive.fail.table.exists = falsedb.batch = false
4.4 执行job
sqoop job --exec myjob
或sqoop job --exec myjob -- --username root
查看job执行成功后HDFS上的结果
hadoop fs -ls /test/a_oldhadoop fs -cat /test/a_old/part-m-00000
4.5 删除job
sqoop job --delete myjob
5. Sqoop的codegen操作
sqoop codegen --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a
[root@cyyun ~]# ll /tmp/sqoop-root/compile/13bae8fcd124763b61658f7c4ed8937e/total 28-rw-r--r--. 1 root root 8715 Dec 23 21:40 a.class-rw-r--r--. 1 root root 3616 Dec 23 21:40 a.jar-rw-r--r--. 1 root root 10346 Dec 23 21:40 a.java
6. Sqoop将MySQL增量数据导入与合并操作
Sqoop官网描述Sqoop User Guide:
Table 5. Incremental import arguments:
Argument Description --check-column (col)
Specifies the column to be examined when determining which rows to import. (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR)--incremental (mode)
Specifies how Sqoop determines which rows are new. Legal values formode
includeappend
andlastmodified
.--last-value (value)
Specifies the maximum value of the check column from the previous import.Sqoop supports two types of incremental imports:
append
andlastmodified
. You can use the--incremental
argument to specify the type of incremental import to perform.You should specify
append
mode when importing a table where new rows are continually being added with increasing row id values. You specify the column containing the row’s id with--check-column
. Sqoop imports rows where the check column has a value greater than the one specified with--last-value
.An alternate table update strategy supported by Sqoop is called
lastmodified
mode. You should use this when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp. Rows where the check column holds a timestamp more recent than the timestamp specified with--last-value
are imported.At the end of an incremental import, the value which should be specified as
--last-value
for a subsequent import is printed to the screen. When running a subsequent import, you should specify--last-value
in this way to ensure you import only the new or updated data. This is handled automatically by creating an incremental import as a saved job, which is the preferred mechanism for performing a recurring incremental import. See the section on saved jobs later in this document for more information.
6.1 MySQL建表
建表并导入数据
mysql> drop table a;Query OK, 0 rows affected (0.81 sec)mysql> create table a(id int,name varchar(50),crt_date timestamp);Query OK, 0 rows affected (0.24 sec)mysql> insert into a values(1,'a1',sysdate());Query OK, 1 row affected (0.20 sec)mysql> insert into a values(2,'a2',sysdate());Query OK, 1 row affected (0.06 sec)mysql> insert into a values(3,'a3',sysdate());Query OK, 1 row affected (0.07 sec)mysql> select * from a;+------+------+---------------------+| id | name | crt_date |+------+------+---------------------+| 1 | a1 | 2016-12-26 17:03:50 || 2 | a2 | 2016-12-26 17:04:00 || 3 | a3 | 2016-12-26 17:04:19 |+------+------+---------------------+3 rows in set (0.02 sec)
6.2 MySQL现有数据导入到HDFS
sqoop import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a -m 1 --target-dir /test/a_1
[root@cyyun ~]# hadoop fs -cat /test/a_1/*16/12/24 01:20:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable1,a1,2016-12-26 17:03:50.02,a2,2016-12-26 17:04:00.03,a3,2016-12-26 17:04:19.0
6.3 MySQL插入增量数据
mysql> insert into a values(4,'a4',sysdate());Query OK, 1 row affected (0.04 sec)mysql> insert into a values(5,'a5',sysdate());Query OK, 1 row affected (0.09 sec)mysql> select * from a;+------+------+---------------------+| id | name | crt_date |+------+------+---------------------+| 1 | a1 | 2016-12-26 17:03:50 || 2 | a2 | 2016-12-26 17:04:00 || 3 | a3 | 2016-12-26 17:04:19 || 4 | a4 | 2016-12-26 17:09:23 || 5 | a5 | 2016-12-26 17:09:36 |+------+------+---------------------+5 rows in set (0.00 sec)
6.4 MySQL增量数据导入到HDFS
sqoop import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a -m 1 --target-dir /test/a_2 --incremental lastmodified --check-column crt_date --last-value "2016-12-26 17:04:20"
--where crt_date>="2016-12-26 17:04:20"
,时间要比id=3大一点,不然会把第3条导进去
[root@cyyun ~]# hadoop fs -cat /test/a_2/*16/12/24 01:27:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable4,a4,2016-12-26 17:09:23.05,a5,2016-12-26 17:09:36.0
6.5 codegen生成jar
[root@cyyun ~]# sqoop codegen --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a16/12/24 01:30:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.616/12/24 01:30:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.16/12/24 01:30:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.16/12/24 01:30:11 INFO tool.CodeGenTool: Beginning code generation16/12/24 01:30:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `a` AS t LIMIT 116/12/24 01:30:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `a` AS t LIMIT 116/12/24 01:30:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop-2.6.4Note: /tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.java uses or overrides a deprecated API.Note: Recompile with -Xlint:deprecation for details.16/12/24 01:30:15 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.jar
编译生成了/tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.jar
6.6 合并
sqoop merge --new-data /test/a_2 --onto /test/a_1 --target-dir /test/a_merge --jar-file /tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.jar --class-name a --merge-key id
注:–class-name a(这里是表名)
[root@cyyun ~]# hadoop fs -cat /test/a_merge/*16/12/24 01:36:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable1,a1,2016-12-26 17:03:50.02,a2,2016-12-26 17:04:00.03,a3,2016-12-26 17:04:19.04,a4,2016-12-26 17:09:23.05,a5,2016-12-26 17:09:36.0
6.7 合并后的数据导入到Hive表
hive> drop table a;OKTime taken: 17.45 secondshive> create table a(id int,name string,crt_date string) row format delimited fields terminated by ',';OKTime taken: 1.586 secondshive> load data inpath '/test/a_merge/part*' into table a;Loading data to table default.aOKTime taken: 2.073 secondshive> select * from a;OK1 a1 2016-12-26 17:03:50.02 a2 2016-12-26 17:04:00.03 a3 2016-12-26 17:04:19.04 a4 2016-12-26 17:09:23.05 a5 2016-12-26 17:09:36.0Time taken: 0.191 seconds, Fetched: 5 row(s)hive>
检查HDFS,会从HDFS中移动到Hive仓库
[root@cyyun ~]# hadoop fs -ls /test/a_merge16/12/24 01:44:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableFound 1 items-rw-r--r-- 1 root supergroup 0 2016-12-24 01:35 /test/a_merge/_SUCCESS[root@cyyun ~]# hadoop fs -ls /user/hive/warehouse/a16/12/24 01:45:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableFound 1 items-rwxr-xr-x 1 root supergroup 135 2016-12-24 01:35 /user/hive/warehouse/a/part-r-00000[root@cyyun ~]#
本文参考:
http://blog.csdn.net/gdmzlhj1/article/details/50483171
- Sqoop进行MySQL和Hive间的导入导出操作
- 通过sqoop进行mysql与hive的导入导出
- [总结]----sqoop 同 MySQL 和 HDFS,Hive 导入导出
- 使用sqoop导入导出mysql、hdfs、hive出现的问题
- 工作中,sqoop导入导出hive,mysql 出现的问题.
- sqoop实现mysql到hive数据库的导入导出
- 测试sqoop从mysql到hive的导入导出
- 大数据基础(二)hadoop, mave, hbase, hive, sqoop在ubuntu 14.04.04下的安装和sqoop与hdfs,hive,mysql导入导出
- 利用sqoop将hive数据导入导出数据到mysql
- 利用sqoop将hive数据导入导出数据到mysql
- 利用sqoop将hive数据导入导出数据到mysql
- 利用sqoop将hive数据导入导出数据到mysql
- 利用sqoop将hive数据导入导出数据到mysql
- 利用sqoop将hive数据导入导出数据到mysql
- 利用sqoop将hive数据导入导出数据到mysql
- 基于sqoop的导入导出操作
- 基于sqoop的导入导出操作
- Hive数据导入、sqoop数据导入导出
- C++ 中error: ‘exit’ was not declared in this scope 的解决方法
- python3中的变量理解
- matplotlib画直线
- 蓝牙的Baseband说明
- Freeline 让AndroidStudio快的飞起来
- Sqoop进行MySQL和Hive间的导入导出操作
- 实验作业 锁屏应用一
- 在Kettle里使用参照表进行数据校验(子转换实现)
- storm技术架构讲解之storm对比hadoop
- Eclipse安装内存分析工具(Memory Analyzer)
- foreach循环
- error while loading shared libraries: libz.so.1: cannot open shared object file: No such file or dir
- Android 系统中camera中 自动对焦的解决方案
- MySQL 远程连接(federated存储引擎)