sqoop 1.4.5 增量导入hive 0.12.0

来源:互联网 发布:文登政府网通知公告栏 编辑:程序博客网 时间:2024/05/29 17:14

1:导入命令

1) bin/sqoop import --connect jdbc:mysql://10.x.x.x:3306/lir --username dss --password xxx123 --table bb_month --hive-import 
2) bin/sqoop import --connect jdbc:mysql://10.x.x.x:3306/lir --table bb_month --username dss -P --hive-import -- --default-character-set=utf-8

第二个命令需要输入密码

上门的命令都需要先在hive中创建表bb_month,否则hive无此表,创建语句:

CREATE TABLE bb_month (  ID INT,  Fcom STRING,  dept1 STRING,  dept2 STRING,  Inputdate STRING,  year INT,  month INT,  quarter INT,  Requestid FLOAT,  StandPrice FLOAT,  FactPrice FLOAT,  HospPrice FLOAT,  RatePrice FLOAT) ;

[jifeng@jifeng02 sqoop]$ bin/sqoop import --connect jdbc:mysql://10.X.X.X:3306/lir --username dss --password XXX123 --table bb_month --hive-import Warning: /home/jifeng/sqoop/../hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /home/jifeng/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.Warning: $HADOOP_HOME is deprecated.14/09/08 18:02:52 INFO sqoop.Sqoop: Running Sqoop version: 1.4.514/09/08 18:02:52 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.14/09/08 18:02:52 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override14/09/08 18:02:52 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.14/09/08 18:02:52 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.14/09/08 18:02:52 INFO tool.CodeGenTool: Beginning code generation14/09/08 18:02:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `bb_month` AS t LIMIT 114/09/08 18:02:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `bb_month` AS t LIMIT 114/09/08 18:02:52 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/jifeng/hadoop/hadoop-1.2.1注: /tmp/sqoop-jifeng/compile/0e2ce2c74924b539ae5ff77b453d708e/bb_month.java使用或覆盖了已过时的 API。注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。14/09/08 18:02:53 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-jifeng/compile/0e2ce2c74924b539ae5ff77b453d708e/bb_month.jar14/09/08 18:02:53 WARN manager.MySQLManager: It looks like you are importing from mysql.14/09/08 18:02:53 WARN manager.MySQLManager: This transfer can be faster! Use the --direct14/09/08 18:02:53 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.14/09/08 18:02:53 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)14/09/08 18:02:53 INFO mapreduce.ImportJobBase: Beginning import of bb_month14/09/08 18:02:53 INFO db.DBInputFormat: Using read commited transaction isolation14/09/08 18:02:53 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`ID`), MAX(`ID`) FROM `bb_month`14/09/08 18:02:54 INFO mapred.JobClient: Running job: job_201409072150_000914/09/08 18:02:55 INFO mapred.JobClient:  map 0% reduce 0%14/09/08 18:03:02 INFO mapred.JobClient:  map 1% reduce 0%14/09/08 18:03:03 INFO mapred.JobClient:  map 3% reduce 0%14/09/08 18:03:05 INFO mapred.JobClient:  map 6% reduce 0%14/09/08 18:03:06 INFO mapred.JobClient:  map 7% reduce 0%14/09/08 18:03:07 INFO mapred.JobClient:  map 9% reduce 0%14/09/08 18:03:08 INFO mapred.JobClient:  map 10% reduce 0%14/09/08 18:03:09 INFO mapred.JobClient:  map 11% reduce 0%14/09/08 18:03:11 INFO mapred.JobClient:  map 14% reduce 0%14/09/08 18:03:12 INFO mapred.JobClient:  map 16% reduce 0%14/09/08 18:03:14 INFO mapred.JobClient:  map 19% reduce 0%14/09/08 18:03:15 INFO mapred.JobClient:  map 21% reduce 0%14/09/08 18:03:16 INFO mapred.JobClient:  map 22% reduce 0%14/09/08 18:03:18 INFO mapred.JobClient:  map 24% reduce 0%14/09/08 18:03:19 INFO mapred.JobClient:  map 26% reduce 0%14/09/08 18:03:22 INFO mapred.JobClient:  map 30% reduce 0%14/09/08 18:03:23 INFO mapred.JobClient:  map 31% reduce 0%14/09/08 18:03:26 INFO mapred.JobClient:  map 34% reduce 0%14/09/08 18:03:27 INFO mapred.JobClient:  map 35% reduce 0%14/09/08 18:03:30 INFO mapred.JobClient:  map 37% reduce 0%14/09/08 18:03:31 INFO mapred.JobClient:  map 39% reduce 0%14/09/08 18:03:32 INFO mapred.JobClient:  map 41% reduce 0%14/09/08 18:03:34 INFO mapred.JobClient:  map 43% reduce 0%14/09/08 18:03:36 INFO mapred.JobClient:  map 45% reduce 0%14/09/08 18:03:39 INFO mapred.JobClient:  map 47% reduce 0%14/09/08 18:03:40 INFO mapred.JobClient:  map 51% reduce 0%14/09/08 18:03:43 INFO mapred.JobClient:  map 53% reduce 0%14/09/08 18:03:44 INFO mapred.JobClient:  map 55% reduce 0%14/09/08 18:03:46 INFO mapred.JobClient:  map 57% reduce 0%14/09/08 18:03:47 INFO mapred.JobClient:  map 61% reduce 0%14/09/08 18:03:50 INFO mapred.JobClient:  map 63% reduce 0%14/09/08 18:03:51 INFO mapred.JobClient:  map 65% reduce 0%14/09/08 18:03:54 INFO mapred.JobClient:  map 68% reduce 0%14/09/08 18:03:55 INFO mapred.JobClient:  map 71% reduce 0%14/09/08 18:03:58 INFO mapred.JobClient:  map 73% reduce 0%14/09/08 18:03:59 INFO mapred.JobClient:  map 75% reduce 0%14/09/08 18:04:02 INFO mapred.JobClient:  map 79% reduce 0%14/09/08 18:04:03 INFO mapred.JobClient:  map 81% reduce 0%14/09/08 18:04:05 INFO mapred.JobClient:  map 83% reduce 0%14/09/08 18:04:06 INFO mapred.JobClient:  map 85% reduce 0%14/09/08 18:04:09 INFO mapred.JobClient:  map 88% reduce 0%14/09/08 18:04:10 INFO mapred.JobClient:  map 91% reduce 0%14/09/08 18:04:13 INFO mapred.JobClient:  map 93% reduce 0%14/09/08 18:04:14 INFO mapred.JobClient:  map 95% reduce 0%14/09/08 18:04:17 INFO mapred.JobClient:  map 100% reduce 0%14/09/08 18:04:20 INFO mapred.JobClient: Job complete: job_201409072150_000914/09/08 18:04:20 INFO mapred.JobClient: Counters: 1814/09/08 18:04:20 INFO mapred.JobClient:   Job Counters 14/09/08 18:04:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=46088814/09/08 18:04:20 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=014/09/08 18:04:20 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=014/09/08 18:04:20 INFO mapred.JobClient:     Launched map tasks=10314/09/08 18:04:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=341814/09/08 18:04:20 INFO mapred.JobClient:   File Output Format Counters 14/09/08 18:04:20 INFO mapred.JobClient:     Bytes Written=423814/09/08 18:04:20 INFO mapred.JobClient:   FileSystemCounters14/09/08 18:04:20 INFO mapred.JobClient:     HDFS_BYTES_READ=1020114/09/08 18:04:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=688495014/09/08 18:04:20 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=423814/09/08 18:04:20 INFO mapred.JobClient:   File Input Format Counters 14/09/08 18:04:20 INFO mapred.JobClient:     Bytes Read=014/09/08 18:04:20 INFO mapred.JobClient:   Map-Reduce Framework14/09/08 18:04:20 INFO mapred.JobClient:     Map input records=4214/09/08 18:04:20 INFO mapred.JobClient:     Physical memory (bytes) snapshot=480308019214/09/08 18:04:20 INFO mapred.JobClient:     Spilled Records=014/09/08 18:04:20 INFO mapred.JobClient:     CPU time spent (ms)=3691014/09/08 18:04:20 INFO mapred.JobClient:     Total committed heap usage (bytes)=193855488014/09/08 18:04:20 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=3816812134414/09/08 18:04:20 INFO mapred.JobClient:     Map output records=4214/09/08 18:04:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1020114/09/08 18:04:20 INFO mapreduce.ImportJobBase: Transferred 4.1387 KB in 86.9275 seconds (48.7533 bytes/sec)14/09/08 18:04:20 INFO mapreduce.ImportJobBase: Retrieved 42 records.14/09/08 18:04:20 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `bb_month` AS t LIMIT 114/09/08 18:04:20 WARN hive.TableDefWriter: Column Requestid had to be cast to a less precise type in Hive14/09/08 18:04:20 WARN hive.TableDefWriter: Column StandPrice had to be cast to a less precise type in Hive14/09/08 18:04:20 WARN hive.TableDefWriter: Column FactPrice had to be cast to a less precise type in Hive14/09/08 18:04:20 WARN hive.TableDefWriter: Column HospPrice had to be cast to a less precise type in Hive14/09/08 18:04:20 WARN hive.TableDefWriter: Column RatePrice had to be cast to a less precise type in Hive14/09/08 18:04:20 INFO hive.HiveImport: Removing temporary files from import process: hdfs://jifeng01:9000/user/jifeng/bb_month/_logs14/09/08 18:04:20 INFO hive.HiveImport: Loading uploaded data into Hive14/09/08 18:04:21 INFO hive.HiveImport: 14/09/08 18:04:21 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/home/jifeng/hadoop/hive-0.12.0-bin/lib/hive-common-0.12.0.jar!/hive-log4j.properties14/09/08 18:04:25 INFO hive.HiveImport: OK14/09/08 18:04:25 INFO hive.HiveImport: Time taken: 3.491 seconds14/09/08 18:04:25 INFO hive.HiveImport: Loading data to table default.bb_month14/09/08 18:04:25 INFO hive.HiveImport: Table default.bb_month stats: [num_partitions: 0, num_files: 202, num_rows: 0, total_size: 8476, raw_data_size: 0]14/09/08 18:04:25 INFO hive.HiveImport: OK14/09/08 18:04:25 INFO hive.HiveImport: Time taken: 0.59 seconds14/09/08 18:04:25 INFO hive.HiveImport: Hive import complete.14/09/08 18:04:25 INFO hive.HiveImport: Export directory is empty, removing it.

2: 增量导入命令

1) bin/sqoop import --connect jdbc:mysql://10.X.X.X:3306/lir --username dss --password xxx123 --table bb_month --hive-import --incremental append --check-column ID --last-value 714
2) bin/sqoop import --connect jdbc:mysql://10.X.X.X:3306/lir --table bb_month --username dss -P --hive-import --incremental append --check-column ID --last-value 714 -- --default-character-set=utf-8



[jifeng@jifeng02 sqoop]$ bin/sqoop import --connect jdbc:mysql://10.X.X.X:3306/lir --username dss --password xxx123 --table bb_month --hive-import --incremental append --check-column ID --last-value 714Warning: /home/jifeng/sqoop/../hcatalog does not exist! HCatalog jobs will fail.Please set $HCAT_HOME to the root of your HCatalog installation.Warning: /home/jifeng/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.Warning: $HADOOP_HOME is deprecated.14/09/08 20:58:45 INFO sqoop.Sqoop: Running Sqoop version: 1.4.514/09/08 20:58:45 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.14/09/08 20:58:45 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override14/09/08 20:58:45 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.14/09/08 20:58:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.14/09/08 20:58:45 INFO tool.CodeGenTool: Beginning code generation14/09/08 20:58:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `bb_month` AS t LIMIT 114/09/08 20:58:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `bb_month` AS t LIMIT 114/09/08 20:58:46 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/jifeng/hadoop/hadoop-1.2.1注: /tmp/sqoop-jifeng/compile/5ce12899d2dc88a968a58234a928a423/bb_month.java使用或覆盖了已过时的 API。注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。14/09/08 20:58:46 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-jifeng/compile/5ce12899d2dc88a968a58234a928a423/bb_month.jar14/09/08 20:58:47 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(`ID`) FROM bb_month14/09/08 20:58:47 INFO tool.ImportTool: Incremental import based on column `ID`14/09/08 20:58:47 INFO tool.ImportTool: Lower bound value: 71414/09/08 20:58:47 INFO tool.ImportTool: Upper bound value: 71514/09/08 20:58:47 WARN manager.MySQLManager: It looks like you are importing from mysql.14/09/08 20:58:47 WARN manager.MySQLManager: This transfer can be faster! Use the --direct14/09/08 20:58:47 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.14/09/08 20:58:47 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)14/09/08 20:58:47 INFO mapreduce.ImportJobBase: Beginning import of bb_month14/09/08 20:58:47 INFO db.DBInputFormat: Using read commited transaction isolation14/09/08 20:58:47 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`ID`), MAX(`ID`) FROM `bb_month` WHERE ( `ID` > 714 AND `ID` <= 715 )14/09/08 20:58:47 INFO mapred.JobClient: Running job: job_201409072150_001614/09/08 20:58:48 INFO mapred.JobClient:  map 0% reduce 0%14/09/08 20:58:56 INFO mapred.JobClient:  map 100% reduce 0%14/09/08 20:58:57 INFO mapred.JobClient: Job complete: job_201409072150_001614/09/08 20:58:57 INFO mapred.JobClient: Counters: 1814/09/08 20:58:57 INFO mapred.JobClient:   Job Counters 14/09/08 20:58:57 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=847814/09/08 20:58:57 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=014/09/08 20:58:57 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=014/09/08 20:58:57 INFO mapred.JobClient:     Launched map tasks=114/09/08 20:58:57 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=014/09/08 20:58:57 INFO mapred.JobClient:   File Output Format Counters 14/09/08 20:58:57 INFO mapred.JobClient:     Bytes Written=7814/09/08 20:58:57 INFO mapred.JobClient:   FileSystemCounters14/09/08 20:58:57 INFO mapred.JobClient:     HDFS_BYTES_READ=10314/09/08 20:58:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=6911314/09/08 20:58:57 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=7814/09/08 20:58:57 INFO mapred.JobClient:   File Input Format Counters 14/09/08 20:58:57 INFO mapred.JobClient:     Bytes Read=014/09/08 20:58:57 INFO mapred.JobClient:   Map-Reduce Framework14/09/08 20:58:57 INFO mapred.JobClient:     Map input records=114/09/08 20:58:57 INFO mapred.JobClient:     Physical memory (bytes) snapshot=7753728014/09/08 20:58:57 INFO mapred.JobClient:     Spilled Records=014/09/08 20:58:57 INFO mapred.JobClient:     CPU time spent (ms)=79014/09/08 20:58:57 INFO mapred.JobClient:     Total committed heap usage (bytes)=3191603214/09/08 20:58:57 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=50682265614/09/08 20:58:57 INFO mapred.JobClient:     Map output records=114/09/08 20:58:57 INFO mapred.JobClient:     SPLIT_RAW_BYTES=10314/09/08 20:58:57 INFO mapreduce.ImportJobBase: Transferred 78 bytes in 10.5935 seconds (7.363 bytes/sec)14/09/08 20:58:57 INFO mapreduce.ImportJobBase: Retrieved 1 records.14/09/08 20:58:57 INFO util.AppendUtils: Creating missing output directory - bb_month14/09/08 20:58:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `bb_month` AS t LIMIT 114/09/08 20:58:57 WARN hive.TableDefWriter: Column Requestid had to be cast to a less precise type in Hive14/09/08 20:58:57 WARN hive.TableDefWriter: Column StandPrice had to be cast to a less precise type in Hive14/09/08 20:58:57 WARN hive.TableDefWriter: Column FactPrice had to be cast to a less precise type in Hive14/09/08 20:58:57 WARN hive.TableDefWriter: Column HospPrice had to be cast to a less precise type in Hive14/09/08 20:58:57 WARN hive.TableDefWriter: Column RatePrice had to be cast to a less precise type in Hive14/09/08 20:58:57 INFO hive.HiveImport: Removing temporary files from import process: hdfs://jifeng01:9000/user/jifeng/bb_month/_logs14/09/08 20:58:57 INFO hive.HiveImport: Loading uploaded data into Hive14/09/08 20:58:58 INFO hive.HiveImport: 14/09/08 20:58:58 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/home/jifeng/hadoop/hive-0.12.0-bin/lib/hive-common-0.12.0.jar!/hive-log4j.properties14/09/08 20:59:01 INFO hive.HiveImport: OK14/09/08 20:59:01 INFO hive.HiveImport: Time taken: 3.076 seconds14/09/08 20:59:01 INFO hive.HiveImport: Loading data to table default.bb_month14/09/08 20:59:02 INFO hive.HiveImport: Table default.bb_month stats: [num_partitions: 0, num_files: 12, num_rows: 0, total_size: 4316, raw_data_size: 0]14/09/08 20:59:02 INFO hive.HiveImport: OK14/09/08 20:59:02 INFO hive.HiveImport: Time taken: 0.399 seconds14/09/08 20:59:02 INFO hive.HiveImport: Hive import complete.14/09/08 20:59:02 INFO hive.HiveImport: Export directory is empty, removing it.14/09/08 20:59:02 INFO tool.ImportTool: Incremental import complete! To run another incremental import of all data following this import, supply the following arguments:14/09/08 20:59:02 INFO tool.ImportTool:  --incremental append14/09/08 20:59:02 INFO tool.ImportTool:   --check-column ID14/09/08 20:59:02 INFO tool.ImportTool:   --last-value 71514/09/08 20:59:02 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')[jifeng@jifeng02 sqoop]$ 

hive中查看(显示被删除了数据)

hive> select * from bb_month;OK296     上海    null    null    2014-02 2014    2       1       13365.0 2205348.5       876260.7        1727203.5       880151.5714     长沙    null    null    2014-03 2014    3       1       37582.0 5116978.5       3691360.2       4794782.0       3698047.8Time taken: 0.057 seconds, Fetched: 42 row(s)hive> select * from bb_month;OK296     上海    null    null    2014-02 2014    2       1       13365.0 2205348.5       876260.7        1727203.5       880151.5715     广州    null    null    2014-04 2014    3       1       22.0    22.0    22.0    22.0    22.0714     长沙    null    null    2014-03 2014    3       1       37582.0 5116978.5       3691360.2       4794782.0       3698047.8Time taken: 0.068 seconds, Fetched: 43 row(s)hive> 

3:免创建表的导入方式:

bin/sqoop import --connect jdbc:mysql://10.6.x.x:3306/lir --username dss --password xxx123 --table DIM_DEPARTMENT --hive-import --create-hive-table -m 1

参数m:mapreduce的任务数



hive导出到mysql
bin/sqoop export --connect jdbc:mysql://10.6.x.x:3306/lir --username dss --password xx123 --table dep_info --export-dir /user/hive/warehouse/dim_department

http://sqoop.apache.org/docs/1.4.5/SqoopUserGuide.html#_importing_data_into_hive

0 0
原创粉丝点击