Sqoop进行MySQL和Hive间的导入导出操作

来源:互联网 发布:java equals string 编辑:程序博客网 时间:2024/05/16 01:22

  • MySQL表数据导入到Hive中
    • 1 MySQL建表
    • 2 Sqoop创建Hive表
    • 3 Sqoop导入数据到Hive
  • Hive表数据导出到MySQL表中
    • 1 MySQL建表
    • 2 Sqoop导出数据到MySQL
  • Sqoop的eval操作
  • Sqoop的job操作
    • 1 创建job
    • 2 查看job集合
    • 3 显示job详细信息
    • 4 执行job
    • 5 删除job
  • Sqoop的codegen操作
  • Sqoop将MySQL增量数据导入与合并操作
    • 1 MySQL建表
    • 2 MySQL现有数据导入到HDFS
    • 3 MySQL插入增量数据
    • 4 MySQL增量数据导入到HDFS
    • 5 codegen生成jar
    • 6 合并
    • 7 合并后的数据导入到Hive表

1.Sqoop在MySQL的数据和Hive之间进行导入导出的操作
2.Sqoop的eval操作
3.Sqoop的job操作
4.Sqoop的codegen生成java代码
5.Sqoop将MySQL的增量数据进行导入与合并操作,并将合并后的数据导入到Hive中

Sqoop1.4.6的安装及命令使用参见 Sqoop1.4.6的安装及命令行使用
Hive的安装参见Hive2.1.1的安装
注:Hadoop集群需要先启动。

1.MySQL表数据导入到Hive中

1.1 MySQL建表

进入到mysql命令行,在mydb数据库下建立表’a’,并插入数据

mysql> use mydb;Database changedmysql> show tables;+----------------+| Tables_in_mydb |+----------------+| user           || user2          |+----------------+2 rows in set (0.00 sec)mysql> create table a(id int,name varchar(50));Query OK, 0 rows affected (0.14 sec)mysql> insert into a values(1,'a1');Query OK, 1 row affected (0.13 sec)mysql> insert into a values(2,'a2');Query OK, 1 row affected (0.03 sec)mysql> select * from a;+------+------+| id   | name |+------+------+|    1 | a1   ||    2 | a2   |+------+------+2 rows in set (0.00 sec)

1.2 Sqoop创建Hive表

sqoop create-hive-table --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a --hive-table a --fields-terminated-by ',' --hive-overwrite

执行成功显示
sqoop创建hive表
进入到Hive命令行中,在default数据库中生成了表a,无数据
hive表创建成功

1.3 Sqoop导入数据到Hive

sqoop import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a --hive-import --fields-terminated-by ',' --hive-overwrite -m 1

导入成功
导入数据到hive
HDFS上查看导入结果

hadoop fs -cat /user/hive/warehouse/a/part-m-00000

warehouse结果
Hive Shell里查看导入的数据
数据导入到hive表成功

2. Hive表数据导出到MySQL表中

2.1 MySQL建表

在mydb数据库下建立表’b’

mysql> create table b(id int,name varchar(50));Query OK, 0 rows affected (0.07 sec)mysql> select * from b;Empty set (0.00 sec)

2.2 Sqoop导出数据到MySQL

将Hive仓库a文件夹数据导出到mysql中的表b

sqoop export --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table b --export-dir /user/hive/warehouse/a --input-fields-terminated-by ','

mysql数据库查看结果

mysql> select * from b;+------+------+| id   | name |+------+------+|    2 | a2   ||    1 | a1   |+------+------+2 rows in set (0.00 sec)

3. Sqoop的eval操作

sqoop eval连接mysql,直接进行select和DML操作

sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --query 'select * from a'

–query换成-e也行
eval查询

sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root -e "insert into a values (3,'a3')"sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --query "insert into a values (4,'a4')"

insert

[root@cyyun ~]# sqoop eval --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --e 'select * from a'16/12/23 20:35:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.616/12/23 20:35:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.16/12/23 20:35:50 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.--------------------------------------| id          | name                 | --------------------------------------| 1           | a1                   | | 2           | a2                   | | 3           | a3                   | | 4           | a4                   | --------------------------------------

4. Sqoop的job操作

4.1 创建job

创建一个job,名为myjob,作用是将MySQL数据库中的表a数据导入到HDFS的/test/a_old目录下

sqoop job --create myjob -- import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root--password root --table a -m 1 --target-dir /test/a_old

job的用法

4.2 查看job集合

sqoop job --list

job的list

4.3 显示job详细信息

[root@cyyun ~]# sqoop job --show myjob16/12/23 20:41:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6Enter password: Job: myjobTool: importOptions:----------------------------verbose = falsedb.connect.string = jdbc:mysql://192.168.1.97:3306/mydbcodegen.output.delimiters.escape = 0codegen.output.delimiters.enclose.required = falsecodegen.input.delimiters.field = 0hbase.create.table = falsedb.require.password = truehdfs.append.dir = falsedb.table = acodegen.input.delimiters.escape = 0import.fetch.size = nullaccumulo.create.table = falsecodegen.input.delimiters.enclose.required = falsedb.username = rootreset.onemapper = falsecodegen.output.delimiters.record = 10import.max.inline.lob.size = 16777216hbase.bulk.load.enabled = falsehcatalog.create.table = falsedb.clear.staging.table = falsecodegen.input.delimiters.record = 0enable.compression = falsehive.overwrite.table = falsehive.import = falsecodegen.input.delimiters.enclose = 0accumulo.batch.size = 10240000hive.drop.delims = falsecodegen.output.delimiters.enclose = 0hdfs.delete-target.dir = falsecodegen.output.dir = .codegen.auto.compile.dir = truerelaxed.isolation = falsemapreduce.num.mappers = 1accumulo.max.latency = 5000import.direct.split.size = 0codegen.output.delimiters.field = 44export.new.update = UpdateOnlyincremental.mode = Nonehdfs.file.format = TextFilecodegen.compile.dir = /tmp/sqoop-root/compile/1df5c52bab08ce1b6e9ae7c193b88264direct.import = falsehdfs.target.dir = /test/a_oldhive.fail.table.exists = falsedb.batch = false

4.4 执行job

sqoop job --exec myjob

sqoop job --exec myjob -- --username root
execjob
查看job执行成功后HDFS上的结果

hadoop fs -ls /test/a_oldhadoop fs -cat /test/a_old/part-m-00000

job执行成功hdfs内容

4.5 删除job

sqoop job --delete myjob

删除job

5. Sqoop的codegen操作

sqoop codegen --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a

生成java代码

[root@cyyun ~]# ll /tmp/sqoop-root/compile/13bae8fcd124763b61658f7c4ed8937e/total 28-rw-r--r--. 1 root root  8715 Dec 23 21:40 a.class-rw-r--r--. 1 root root  3616 Dec 23 21:40 a.jar-rw-r--r--. 1 root root 10346 Dec 23 21:40 a.java

6. Sqoop将MySQL增量数据导入与合并操作

Sqoop官网描述Sqoop User Guide:

Table 5. Incremental import arguments:

Argument Description --check-column (col) Specifies the column to be examined when determining which rows to import. (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR) --incremental (mode) Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified. --last-value (value) Specifies the maximum value of the check column from the previous import.

Sqoop supports two types of incremental imports: append and lastmodified. You can use the --incremental argument to specify the type of incremental import to perform.

You should specify append mode when importing a table where new rows are continually being added with increasing row id values. You specify the column containing the row’s id with--check-column. Sqoop imports rows where the check column has a value greater than the one specified with --last-value.

An alternate table update strategy supported by Sqoop is called lastmodified mode. You should use this when rows of the source table may be updated, and each such update will set the value of a last-modified column to the current timestamp. Rows where the check column holds a timestamp more recent than the timestamp specified with --last-value are imported.

At the end of an incremental import, the value which should be specified as --last-value for a subsequent import is printed to the screen. When running a subsequent import, you should specify --last-value in this way to ensure you import only the new or updated data. This is handled automatically by creating an incremental import as a saved job, which is the preferred mechanism for performing a recurring incremental import. See the section on saved jobs later in this document for more information.

6.1 MySQL建表

建表并导入数据

mysql> drop table a;Query OK, 0 rows affected (0.81 sec)mysql> create table a(id int,name varchar(50),crt_date timestamp);Query OK, 0 rows affected (0.24 sec)mysql> insert into a values(1,'a1',sysdate());Query OK, 1 row affected (0.20 sec)mysql> insert into a values(2,'a2',sysdate());Query OK, 1 row affected (0.06 sec)mysql> insert into a values(3,'a3',sysdate());Query OK, 1 row affected (0.07 sec)mysql> select * from a;+------+------+---------------------+| id   | name | crt_date            |+------+------+---------------------+|    1 | a1   | 2016-12-26 17:03:50 ||    2 | a2   | 2016-12-26 17:04:00 ||    3 | a3   | 2016-12-26 17:04:19 |+------+------+---------------------+3 rows in set (0.02 sec)

6.2 MySQL现有数据导入到HDFS

sqoop import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a -m 1 --target-dir /test/a_1
[root@cyyun ~]# hadoop fs -cat /test/a_1/*16/12/24 01:20:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable1,a1,2016-12-26 17:03:50.02,a2,2016-12-26 17:04:00.03,a3,2016-12-26 17:04:19.0

6.3 MySQL插入增量数据

mysql> insert into a values(4,'a4',sysdate());Query OK, 1 row affected (0.04 sec)mysql> insert into a values(5,'a5',sysdate());Query OK, 1 row affected (0.09 sec)mysql> select * from a;+------+------+---------------------+| id   | name | crt_date            |+------+------+---------------------+|    1 | a1   | 2016-12-26 17:03:50 ||    2 | a2   | 2016-12-26 17:04:00 ||    3 | a3   | 2016-12-26 17:04:19 ||    4 | a4   | 2016-12-26 17:09:23 ||    5 | a5   | 2016-12-26 17:09:36 |+------+------+---------------------+5 rows in set (0.00 sec)

6.4 MySQL增量数据导入到HDFS

sqoop import --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a  -m 1 --target-dir /test/a_2 --incremental lastmodified --check-column crt_date --last-value "2016-12-26 17:04:20"

--where crt_date>="2016-12-26 17:04:20",时间要比id=3大一点,不然会把第3条导进去

[root@cyyun ~]# hadoop fs -cat /test/a_2/*16/12/24 01:27:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable4,a4,2016-12-26 17:09:23.05,a5,2016-12-26 17:09:36.0

6.5 codegen生成jar

[root@cyyun ~]# sqoop codegen --connect jdbc:mysql://192.168.1.97:3306/mydb --username root --password root --table a16/12/24 01:30:11 INFO sqoop.Sqoop: Running Sqoop version: 1.4.616/12/24 01:30:11 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.16/12/24 01:30:11 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.16/12/24 01:30:11 INFO tool.CodeGenTool: Beginning code generation16/12/24 01:30:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `a` AS t LIMIT 116/12/24 01:30:12 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `a` AS t LIMIT 116/12/24 01:30:12 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop-2.6.4Note: /tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.java uses or overrides a deprecated API.Note: Recompile with -Xlint:deprecation for details.16/12/24 01:30:15 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.jar

编译生成了/tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.jar

6.6 合并

sqoop merge --new-data /test/a_2 --onto /test/a_1 --target-dir /test/a_merge --jar-file /tmp/sqoop-root/compile/57c71c4b6aa32458bb7a2d2636ba673f/a.jar --class-name a --merge-key id

注:–class-name a(这里是表名)

[root@cyyun ~]# hadoop fs -cat /test/a_merge/*16/12/24 01:36:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable1,a1,2016-12-26 17:03:50.02,a2,2016-12-26 17:04:00.03,a3,2016-12-26 17:04:19.04,a4,2016-12-26 17:09:23.05,a5,2016-12-26 17:09:36.0

6.7 合并后的数据导入到Hive表

hive> drop table a;OKTime taken: 17.45 secondshive>  create table a(id int,name string,crt_date string) row format delimited fields terminated by ',';OKTime taken: 1.586 secondshive> load data inpath '/test/a_merge/part*' into table a;Loading data to table default.aOKTime taken: 2.073 secondshive> select * from a;OK1   a1  2016-12-26 17:03:50.02   a2  2016-12-26 17:04:00.03   a3  2016-12-26 17:04:19.04   a4  2016-12-26 17:09:23.05   a5  2016-12-26 17:09:36.0Time taken: 0.191 seconds, Fetched: 5 row(s)hive> 

检查HDFS,会从HDFS中移动到Hive仓库

[root@cyyun ~]# hadoop fs -ls /test/a_merge16/12/24 01:44:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableFound 1 items-rw-r--r--   1 root supergroup          0 2016-12-24 01:35 /test/a_merge/_SUCCESS[root@cyyun ~]# hadoop fs -ls /user/hive/warehouse/a16/12/24 01:45:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableFound 1 items-rwxr-xr-x   1 root supergroup        135 2016-12-24 01:35 /user/hive/warehouse/a/part-r-00000[root@cyyun ~]# 

本文参考:
http://blog.csdn.net/gdmzlhj1/article/details/50483171

0 0