使用Sqoop进行数据的导入

来源:互联网 发布:2017手机必备软件 编辑:程序博客网 时间:2024/05/19 17:24
一 下载地址
http://archive.apache.org/dist/sqoop/1.4.4/
[root@localhost bin]# wget http://archive.apache.org/dist/sqoop/1.4.4/sqoop-1.4.4.bin__hadoop-0.20.tar.gz

二 解压sqoop
[root@localhost bin]# tar -zxvf sqoop-1.4.4.bin__hadoop-0.20.tar.gz

三 导入两个环境变量和导入ojdbc14.jar库
[root@localhost ~]# export HADOOP_COMMON_HOME=/opt/hadoop-1.2.1/
[root@localhost ~]# export HADOOP_MAPRED_HOME=/opt/hadoop-1.2.1/

四 sqoop程序结构认识
[root@localhost bin]# ls
configure-sqoop sqoop-codegen sqoop-import sqoop-merge
configure-sqoop.cmd sqoop-create-hive-table sqoop-import-all-tables sqoop-metastore
emp.java sqoop-eval sqoop-job sqoop-version
sqoop sqoop-export sqoop-list-databases start-metastore.sh
sqoop.cmd sqoop-help sqoop-list-tables stop-metastore.sh

五 准备oracle数据库中的表
SQL> select * from emp;

EMPNO ENAME EMPJOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- -------------- ---------- ---------- ----------
1001 Tome 5000 10
1002 Tome 5000 10
7369 SMITH CLERK 7902 17-12月-80 4800 20
7499 ALLEN SALESMAN 7698 20-2月 -81 5600 300 30
7521 WARD SALESMAN 7698 22-2月 -81 5250 500 30
7566 JONES MANAGER 7839 02-4月 -81 9075 20
7654 MARTIN SALESMAN 7698 28-9月 -81 5250 1400 30
7698 BLAKE MANAGER 7839 01-5月 -81 8850 30
7782 CLARK MANAGER 7839 09-6月 -81 8450 10
7788 SCOTT ANALYST 7566 19-4月 -87 7000 20
7839 KING PRESIDENT 17-11月-81 12111 10

EMPNO ENAME EMPJOB MGR HIREDATE SAL COMM DEPTNO
---------- ---------- --------- ---------- -------------- ---------- ---------- ----------
7844 TURNER SALESMAN 7698 08-9月 -81 5500 0 30
7876 ADAMS CLERK 7788 23-5月 -87 5100 20
7900 JAMES CLERK 7698 03-12月-81 4950 30
7902 FORD ANALYST 7566 03-12月-81 7000 20
7934 MILLER CLERK 7782 23-1月 -82 5300 10

已选择16行。

六 实战
1、使用Sqoop导入oracle数据到HDFS中
./sqoop import --connect jdbc:oracle:thin:@192.168.0.107:1521:orcl --username scott --password password --table emp --columns 'empno,ename,empjob,sal,deptno' -m 1 --target-dir 'sqoop/emp'
Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: $HADOOP_HOME is deprecated.

17/08/27 17:27:17 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/08/27 17:27:17 INFO manager.SqlManager: Using default fetchSize of 1000
17/08/27 17:27:17 INFO tool.CodeGenTool: Beginning code generation
17/08/27 17:27:18 INFO manager.OracleManager: Time zone has been set to GMT
17/08/27 17:27:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM emp t WHERE 1=0
17/08/27 17:27:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-1.2.1
Note: /tmp/sqoop-root/compile/01b577f440858d215f88c58ab15210d6/emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/08/27 17:27:20 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/01b577f440858d215f88c58ab15210d6/emp.jar
17/08/27 17:27:20 INFO manager.OracleManager: Time zone has been set to GMT
17/08/27 17:27:20 INFO mapreduce.ImportJobBase: Beginning import of emp
17/08/27 17:27:24 INFO mapred.JobClient: Running job: job_201708271710_0002
17/08/27 17:27:25 INFO mapred.JobClient: map 0% reduce 0%
17/08/27 17:28:46 INFO mapred.JobClient: map 100% reduce 0%
17/08/27 17:28:58 INFO mapred.JobClient: Job complete: job_201708271710_0002
17/08/27 17:28:58 INFO mapred.JobClient: Counters: 18
17/08/27 17:28:58 INFO mapred.JobClient: Job Counters
17/08/27 17:28:58 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=88849
17/08/27 17:28:58 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
17/08/27 17:28:58 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
17/08/27 17:28:58 INFO mapred.JobClient: Launched map tasks=1
17/08/27 17:28:58 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
17/08/27 17:28:58 INFO mapred.JobClient: File Output Format Counters
17/08/27 17:28:58 INFO mapred.JobClient: Bytes Written=423
17/08/27 17:28:58 INFO mapred.JobClient: FileSystemCounters
17/08/27 17:28:58 INFO mapred.JobClient: HDFS_BYTES_READ=87
17/08/27 17:28:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=61082
17/08/27 17:28:58 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=423
17/08/27 17:28:58 INFO mapred.JobClient: File Input Format Counters
17/08/27 17:28:58 INFO mapred.JobClient: Bytes Read=0
17/08/27 17:28:58 INFO mapred.JobClient: Map-Reduce Framework
17/08/27 17:28:58 INFO mapred.JobClient: Map input records=16
17/08/27 17:28:58 INFO mapred.JobClient: Physical memory (bytes) snapshot=66248704
17/08/27 17:28:58 INFO mapred.JobClient: Spilled Records=0
17/08/27 17:28:58 INFO mapred.JobClient: CPU time spent (ms)=960
17/08/27 17:28:58 INFO mapred.JobClient: Total committed heap usage (bytes)=15794176
17/08/27 17:28:58 INFO mapred.JobClient: Virtual memory (bytes) snapshot=855474176
17/08/27 17:28:58 INFO mapred.JobClient: Map output records=16
17/08/27 17:28:58 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
17/08/27 17:28:58 INFO mapreduce.ImportJobBase: Transferred 423 bytes in 97.9276 seconds (4.3195 bytes/sec)
17/08/27 17:28:58 INFO mapreduce.ImportJobBase: Retrieved 16 records.
校验结果

2、使用Sqoop导入oracle数据到Hive中
[root@localhost bin]# ./sqoop import --hive-import --connect jdbc:oracle:thin:@192.168.0.107:1521:orcl --username scott --password password --table emp -m 1 --columns 'EMPNO,ENAME,EMPJOB,SAL,DEPTNO'
Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: $HADOOP_HOME is deprecated.

17/08/27 21:12:20 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/08/27 21:12:20 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
17/08/27 21:12:20 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
17/08/27 21:12:20 INFO manager.SqlManager: Using default fetchSize of 1000
17/08/27 21:12:20 INFO tool.CodeGenTool: Beginning code generation
17/08/27 21:12:21 INFO manager.OracleManager: Time zone has been set to GMT
17/08/27 21:12:21 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM emp t WHERE 1=0
17/08/27 21:12:21 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-1.2.1
Note: /tmp/sqoop-root/compile/d7f0b3b769d53fd348f2348999db2fea/emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/08/27 21:12:22 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/d7f0b3b769d53fd348f2348999db2fea/emp.jar
17/08/27 21:12:22 INFO manager.OracleManager: Time zone has been set to GMT
17/08/27 21:12:22 INFO mapreduce.ImportJobBase: Beginning import of emp
17/08/27 21:12:27 INFO mapred.JobClient: Running job: job_201708272044_0005
17/08/27 21:12:28 INFO mapred.JobClient: map 0% reduce 0%
17/08/27 21:12:50 INFO mapred.JobClient: map 100% reduce 0%
17/08/27 21:12:52 INFO mapred.JobClient: Job complete: job_201708272044_0005
17/08/27 21:12:52 INFO mapred.JobClient: Counters: 18
17/08/27 21:12:52 INFO mapred.JobClient: Job Counters
17/08/27 21:12:52 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21768
17/08/27 21:12:52 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
17/08/27 21:12:52 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
17/08/27 21:12:52 INFO mapred.JobClient: Launched map tasks=1
17/08/27 21:12:52 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
17/08/27 21:12:52 INFO mapred.JobClient: File Output Format Counters
17/08/27 21:12:52 INFO mapred.JobClient: Bytes Written=423
17/08/27 21:12:52 INFO mapred.JobClient: FileSystemCounters
17/08/27 21:12:52 INFO mapred.JobClient: HDFS_BYTES_READ=87
17/08/27 21:12:52 INFO mapred.JobClient: FILE_BYTES_WRITTEN=61072
17/08/27 21:12:52 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=423
17/08/27 21:12:52 INFO mapred.JobClient: File Input Format Counters
17/08/27 21:12:52 INFO mapred.JobClient: Bytes Read=0
17/08/27 21:12:52 INFO mapred.JobClient: Map-Reduce Framework
17/08/27 21:12:52 INFO mapred.JobClient: Map input records=16
17/08/27 21:12:52 INFO mapred.JobClient: Physical memory (bytes) snapshot=73703424
17/08/27 21:12:52 INFO mapred.JobClient: Spilled Records=0
17/08/27 21:12:52 INFO mapred.JobClient: CPU time spent (ms)=1240
17/08/27 21:12:52 INFO mapred.JobClient: Total committed heap usage (bytes)=15794176
17/08/27 21:12:52 INFO mapred.JobClient: Virtual memory (bytes) snapshot=855609344
17/08/27 21:12:52 INFO mapred.JobClient: Map output records=16
17/08/27 21:12:52 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
17/08/27 21:12:52 INFO mapreduce.ImportJobBase: Transferred 423 bytes in 29.1572 seconds (14.5075 bytes/sec)
17/08/27 21:12:52 INFO mapreduce.ImportJobBase: Retrieved 16 records.
17/08/27 21:12:52 INFO manager.OracleManager: Time zone has been set to GMT
17/08/27 21:12:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM emp t WHERE 1=0
17/08/27 21:12:52 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive
17/08/27 21:12:52 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive
17/08/27 21:12:52 WARN hive.TableDefWriter: Column DEPTNO had to be cast to a less precise type in Hive
17/08/27 21:12:52 INFO hive.HiveImport: Removing temporary files from import process: hdfs://localhost:9000/user/root/emp/_logs
17/08/27 21:12:52 INFO hive.HiveImport: Loading uploaded data into Hive
17/08/27 21:13:00 INFO hive.HiveImport:
17/08/27 21:13:00 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/root/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
17/08/27 21:13:19 INFO hive.HiveImport: OK
17/08/27 21:13:19 INFO hive.HiveImport: Time taken: 3.502 seconds
17/08/27 21:13:19 INFO hive.HiveImport: Loading data to table default.emp
17/08/27 21:13:20 INFO hive.HiveImport: Table default.emp stats: [numFiles=2, numRows=0, totalSize=423, rawDataSize=0]
17/08/27 21:13:20 INFO hive.HiveImport: OK
17/08/27 21:13:20 INFO hive.HiveImport: Time taken: 1.502 seconds
17/08/27 21:13:21 INFO hive.HiveImport: Hive import complete.
17/08/27 21:13:21 INFO hive.HiveImport: Export directory is empty, removing it.
验证
hive> show tables;
OK
test1
Time taken: 0.022 seconds, Fetched: 1 row(s)
hive> show tables;
OK
emp
test1
Time taken: 0.066 seconds, Fetched: 2 row(s)
hive> select * from emp
> ;
OK
1001.0 Tome null 5000.0 10.0
1002.0 Tome null 5000.0 10.0
7369.0 SMITH CLERK 4800.0 20.0
7499.0 ALLEN SALESMAN 5600.0 30.0
7521.0 WARD SALESMAN 5250.0 30.0
7566.0 JONES MANAGER 9075.0 20.0
7654.0 MARTIN SALESMAN 5250.0 30.0
7698.0 BLAKE MANAGER 8850.0 30.0
7782.0 CLARK MANAGER 8450.0 10.0
7788.0 SCOTT ANALYST 7000.0 20.0
7839.0 KING PRESIDENT 12111.0 10.0
7844.0 TURNER SALESMAN 5500.0 30.0
7876.0 ADAMS CLERK 5100.0 20.0
7900.0 JAMES CLERK 4950.0 30.0
7902.0 FORD ANALYST 7000.0 20.0
7934.0 MILLER CLERK 5300.0 10.0
Time taken: 2.56 seconds, Fetched: 16 row(s)
3、使用Sqoop导入oracle数据到Hive中,并且指定表名
[root@localhost bin]# ./sqoop import --hive-import --connect jdbc:oracle:thin:@192.168.0.107:1521:orcl --username scott --password password --table emp -m 1 --columns 'EMPNO,ENAME,EMPJOB,SAL,DEPTNO' --hive-table emp1
Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: $HADOOP_HOME is deprecated.

17/08/27 21:18:06 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/08/27 21:18:06 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
17/08/27 21:18:06 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
17/08/27 21:18:06 WARN tool.BaseSqoopTool: It seems that you've specified at least one of following:
17/08/27 21:18:06 WARN tool.BaseSqoopTool: --hive-home
17/08/27 21:18:06 WARN tool.BaseSqoopTool: --hive-overwrite
17/08/27 21:18:06 WARN tool.BaseSqoopTool: --create-hive-table
17/08/27 21:18:06 WARN tool.BaseSqoopTool: --hive-table
17/08/27 21:18:06 WARN tool.BaseSqoopTool: --hive-partition-key
17/08/27 21:18:06 WARN tool.BaseSqoopTool: --hive-partition-value
17/08/27 21:18:06 WARN tool.BaseSqoopTool: --map-column-hive
17/08/27 21:18:06 WARN tool.BaseSqoopTool: Without specifying parameter --hive-import. Please note that
17/08/27 21:18:06 WARN tool.BaseSqoopTool: those arguments will not be used in this session. Either
17/08/27 21:18:06 WARN tool.BaseSqoopTool: specify --hive-import to apply them correctly or remove them
17/08/27 21:18:06 WARN tool.BaseSqoopTool: from command line to remove this warning.
17/08/27 21:18:06 INFO tool.BaseSqoopTool: Please note that --hive-home, --hive-partition-key,
17/08/27 21:18:06 INFO tool.BaseSqoopTool: hive-partition-value and --map-column-hive options are
17/08/27 21:18:06 INFO tool.BaseSqoopTool: are also valid for HCatalog imports and exports
17/08/27 21:18:06 INFO manager.SqlManager: Using default fetchSize of 1000
17/08/27 21:18:06 INFO tool.CodeGenTool: Beginning code generation
17/08/27 21:18:07 INFO manager.OracleManager: Time zone has been set to GMT
17/08/27 21:18:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM emp t WHERE 1=0
17/08/27 21:18:07 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-1.2.1
Note: /tmp/sqoop-root/compile/18baa238421b7e89277246124779f3e0/emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/08/27 21:18:10 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/18baa238421b7e89277246124779f3e0/emp.jar
17/08/27 21:18:10 INFO manager.OracleManager: Time zone has been set to GMT
17/08/27 21:18:10 INFO mapreduce.ImportJobBase: Beginning import of emp
17/08/27 21:18:16 INFO mapred.JobClient: Running job: job_201708272044_0006
17/08/27 21:18:17 INFO mapred.JobClient: map 0% reduce 0%
17/08/27 21:18:37 INFO mapred.JobClient: map 100% reduce 0%
17/08/27 21:18:40 INFO mapred.JobClient: Job complete: job_201708272044_0006
17/08/27 21:18:40 INFO mapred.JobClient: Counters: 18
17/08/27 21:18:40 INFO mapred.JobClient: Job Counters
17/08/27 21:18:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=19905
17/08/27 21:18:40 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
17/08/27 21:18:40 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
17/08/27 21:18:40 INFO mapred.JobClient: Launched map tasks=1
17/08/27 21:18:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
17/08/27 21:18:40 INFO mapred.JobClient: File Output Format Counters
17/08/27 21:18:40 INFO mapred.JobClient: Bytes Written=423
17/08/27 21:18:40 INFO mapred.JobClient: FileSystemCounters
17/08/27 21:18:40 INFO mapred.JobClient: HDFS_BYTES_READ=87
17/08/27 21:18:40 INFO mapred.JobClient: FILE_BYTES_WRITTEN=61072
17/08/27 21:18:40 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=423
17/08/27 21:18:40 INFO mapred.JobClient: File Input Format Counters
17/08/27 21:18:40 INFO mapred.JobClient: Bytes Read=0
17/08/27 21:18:40 INFO mapred.JobClient: Map-Reduce Framework
17/08/27 21:18:40 INFO mapred.JobClient: Map input records=16
17/08/27 21:18:40 INFO mapred.JobClient: Physical memory (bytes) snapshot=62005248
17/08/27 21:18:40 INFO mapred.JobClient: Spilled Records=0
17/08/27 21:18:40 INFO mapred.JobClient: CPU time spent (ms)=1170
17/08/27 21:18:40 INFO mapred.JobClient: Total committed heap usage (bytes)=15794176
17/08/27 21:18:40 INFO mapred.JobClient: Virtual memory (bytes) snapshot=855609344
17/08/27 21:18:40 INFO mapred.JobClient: Map output records=16
17/08/27 21:18:40 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
17/08/27 21:18:40 INFO mapreduce.ImportJobBase: Transferred 423 bytes in 28.5223 seconds (14.8305 bytes/sec)
17/08/27 21:18:40 INFO mapreduce.ImportJobBase: Retrieved 16 records.
17/08/27 21:18:40 INFO manager.OracleManager: Time zone has been set to GMT
17/08/27 21:18:40 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM emp t WHERE 1=0
17/08/27 21:18:40 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive
17/08/27 21:18:40 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive
17/08/27 21:18:40 WARN hive.TableDefWriter: Column DEPTNO had to be cast to a less precise type in Hive
17/08/27 21:18:40 INFO hive.HiveImport: Removing temporary files from import process: hdfs://localhost:9000/user/root/emp/_logs
17/08/27 21:18:40 INFO hive.HiveImport: Loading uploaded data into Hive
17/08/27 21:18:46 INFO hive.HiveImport:
17/08/27 21:18:46 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/root/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
17/08/27 21:19:08 INFO hive.HiveImport: OK
17/08/27 21:19:08 INFO hive.HiveImport: Time taken: 8.723 seconds
17/08/27 21:19:09 INFO hive.HiveImport: Loading data to table default.emp1
17/08/27 21:19:10 INFO hive.HiveImport: Table default.emp1 stats: [numFiles=2, numRows=0, totalSize=423, rawDataSize=0]
17/08/27 21:19:10 INFO hive.HiveImport: OK
17/08/27 21:19:10 INFO hive.HiveImport: Time taken: 1.509 seconds
17/08/27 21:19:11 INFO hive.HiveImport: Hive import complete.
17/08/27 21:19:11 INFO hive.HiveImport: Export directory is empty, removing it.
检验:
> show tables;
OK
emp
emp1
test1
Time taken: 0.158 seconds, Fetched: 3 row(s)
hive> select * from emp1;
OK
1001.0 Tome null 5000.0 10.0
1002.0 Tome null 5000.0 10.0
7369.0 SMITH CLERK 4800.0 20.0
7499.0 ALLEN SALESMAN 5600.0 30.0
7521.0 WARD SALESMAN 5250.0 30.0
7566.0 JONES MANAGER 9075.0 20.0
7654.0 MARTIN SALESMAN 5250.0 30.0
7698.0 BLAKE MANAGER 8850.0 30.0
7782.0 CLARK MANAGER 8450.0 10.0
7788.0 SCOTT ANALYST 7000.0 20.0
7839.0 KING PRESIDENT 12111.0 10.0
7844.0 TURNER SALESMAN 5500.0 30.0
7876.0 ADAMS CLERK 5100.0 20.0
7900.0 JAMES CLERK 4950.0 30.0
7902.0 FORD ANALYST 7000.0 20.0
7934.0 MILLER CLERK 5300.0 10.0
Time taken: 0.213 seconds, Fetched: 16 row(s)
4、使用Sqoop导入oracle数据到Hive中,并使用where条件
[root@localhost bin]# ./sqoop import --hive-import --connect jdbc:oracle:thin:@192.168.0.107:1521:orcl --username scott --password password --table emp -m 1 --columns 'EMPNO,ENAME,EMPJOB,SAL,DEPTNO' --hive-table emp2 --where 'DEPTNO=10'
Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: $HADOOP_HOME is deprecated.

17/08/27 21:27:06 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/08/27 21:27:06 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
17/08/27 21:27:06 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
17/08/27 21:27:06 WARN tool.BaseSqoopTool: It seems that you've specified at least one of following:
17/08/27 21:27:06 WARN tool.BaseSqoopTool: --hive-home
17/08/27 21:27:06 WARN tool.BaseSqoopTool: --hive-overwrite
17/08/27 21:27:06 WARN tool.BaseSqoopTool: --create-hive-table
17/08/27 21:27:06 WARN tool.BaseSqoopTool: --hive-table
17/08/27 21:27:06 WARN tool.BaseSqoopTool: --hive-partition-key
17/08/27 21:27:06 WARN tool.BaseSqoopTool: --hive-partition-value
17/08/27 21:27:06 WARN tool.BaseSqoopTool: --map-column-hive
17/08/27 21:27:06 WARN tool.BaseSqoopTool: Without specifying parameter --hive-import. Please note that
17/08/27 21:27:06 WARN tool.BaseSqoopTool: those arguments will not be used in this session. Either
17/08/27 21:27:06 WARN tool.BaseSqoopTool: specify --hive-import to apply them correctly or remove them
17/08/27 21:27:06 WARN tool.BaseSqoopTool: from command line to remove this warning.
17/08/27 21:27:06 INFO tool.BaseSqoopTool: Please note that --hive-home, --hive-partition-key,
17/08/27 21:27:06 INFO tool.BaseSqoopTool: hive-partition-value and --map-column-hive options are
17/08/27 21:27:06 INFO tool.BaseSqoopTool: are also valid for HCatalog imports and exports
17/08/27 21:27:07 INFO manager.SqlManager: Using default fetchSize of 1000
17/08/27 21:27:07 INFO tool.CodeGenTool: Beginning code generation
17/08/27 21:27:08 INFO manager.OracleManager: Time zone has been set to GMT
17/08/27 21:27:08 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM emp t WHERE 1=0
17/08/27 21:27:08 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-1.2.1
Note: /tmp/sqoop-root/compile/33b1d1e75d2dbb6f680f0f8949aa6ed5/emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/08/27 21:27:11 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/33b1d1e75d2dbb6f680f0f8949aa6ed5/emp.jar
17/08/27 21:27:11 INFO manager.OracleManager: Time zone has been set to GMT
17/08/27 21:27:11 INFO mapreduce.ImportJobBase: Beginning import of emp
17/08/27 21:27:17 INFO mapred.JobClient: Running job: job_201708272044_0007
17/08/27 21:27:18 INFO mapred.JobClient: map 0% reduce 0%
17/08/27 21:27:45 INFO mapred.JobClient: map 100% reduce 0%
17/08/27 21:27:49 INFO mapred.JobClient: Job complete: job_201708272044_0007
17/08/27 21:27:49 INFO mapred.JobClient: Counters: 18
17/08/27 21:27:49 INFO mapred.JobClient: Job Counters
17/08/27 21:27:49 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=29043
17/08/27 21:27:49 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
17/08/27 21:27:49 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
17/08/27 21:27:49 INFO mapred.JobClient: Launched map tasks=1
17/08/27 21:27:49 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
17/08/27 21:27:49 INFO mapred.JobClient: File Output Format Counters
17/08/27 21:27:49 INFO mapred.JobClient: Bytes Written=128
17/08/27 21:27:49 INFO mapred.JobClient: FileSystemCounters
17/08/27 21:27:49 INFO mapred.JobClient: HDFS_BYTES_READ=87
17/08/27 21:27:49 INFO mapred.JobClient: FILE_BYTES_WRITTEN=61257
17/08/27 21:27:49 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=128
17/08/27 21:27:49 INFO mapred.JobClient: File Input Format Counters
17/08/27 21:27:49 INFO mapred.JobClient: Bytes Read=0
17/08/27 21:27:49 INFO mapred.JobClient: Map-Reduce Framework
17/08/27 21:27:49 INFO mapred.JobClient: Map input records=5
17/08/27 21:27:49 INFO mapred.JobClient: Physical memory (bytes) snapshot=74575872
17/08/27 21:27:49 INFO mapred.JobClient: Spilled Records=0
17/08/27 21:27:49 INFO mapred.JobClient: CPU time spent (ms)=1850
17/08/27 21:27:49 INFO mapred.JobClient: Total committed heap usage (bytes)=15794176
17/08/27 21:27:49 INFO mapred.JobClient: Virtual memory (bytes) snapshot=855609344
17/08/27 21:27:49 INFO mapred.JobClient: Map output records=5
17/08/27 21:27:49 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
17/08/27 21:27:49 INFO mapreduce.ImportJobBase: Transferred 128 bytes in 37.0889 seconds (3.4512 bytes/sec)
17/08/27 21:27:49 INFO mapreduce.ImportJobBase: Retrieved 5 records.
17/08/27 21:27:49 INFO manager.OracleManager: Time zone has been set to GMT
17/08/27 21:27:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM emp t WHERE 1=0
17/08/27 21:27:49 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive
17/08/27 21:27:49 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive
17/08/27 21:27:49 WARN hive.TableDefWriter: Column DEPTNO had to be cast to a less precise type in Hive
17/08/27 21:27:49 INFO hive.HiveImport: Removing temporary files from import process: hdfs://localhost:9000/user/root/emp/_logs
17/08/27 21:27:49 INFO hive.HiveImport: Loading uploaded data into Hive
17/08/27 21:27:55 INFO hive.HiveImport:
17/08/27 21:27:55 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/root/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
17/08/27 21:28:16 INFO hive.HiveImport: OK
17/08/27 21:28:16 INFO hive.HiveImport: Time taken: 4.348 seconds
17/08/27 21:28:17 INFO hive.HiveImport: Loading data to table default.emp2
17/08/27 21:28:17 INFO hive.HiveImport: Table default.emp2 stats: [numFiles=2, numRows=0, totalSize=128, rawDataSize=0]
17/08/27 21:28:17 INFO hive.HiveImport: OK
17/08/27 21:28:17 INFO hive.HiveImport: Time taken: 1.162 seconds
17/08/27 21:28:18 INFO hive.HiveImport: Hive import complete.
17/08/27 21:28:18 INFO hive.HiveImport: Export directory is empty, removing it.
检验:
hive> show tables;
OK
emp
emp1
emp2
test1
Time taken: 0.136 seconds, Fetched: 4 row(s)
hive> select * from emp2;
OK
1001.0 Tome null 5000.0 10.0
1002.0 Tome null 5000.0 10.0
7782.0 CLARK MANAGER 8450.0 10.0
7839.0 KING PRESIDENT 12111.0 10.0
7934.0 MILLER CLERK 5300.0 10.0
Time taken: 0.133 seconds, Fetched: 5 row(s)
5、使用Sqoop导入oracle数据到Hive中,并使用查询语句
[root@localhost bin]# ./sqoop import --hive-import --connect jdbc:oracle:thin:@192.168.0.107:1521:orcl --username scott --password password -m 1 --query 'select * from EMP where SAL<2000 and $CONDITIONS' --target-dir '/sqoop/emp5' --hive-table emp5
Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: $HADOOP_HOME is deprecated.

17/08/27 21:39:36 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/08/27 21:39:36 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
17/08/27 21:39:36 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
17/08/27 21:39:36 WARN tool.BaseSqoopTool: It seems that you've specified at least one of following:
17/08/27 21:39:36 WARN tool.BaseSqoopTool: --hive-home
17/08/27 21:39:36 WARN tool.BaseSqoopTool: --hive-overwrite
17/08/27 21:39:36 WARN tool.BaseSqoopTool: --create-hive-table
17/08/27 21:39:36 WARN tool.BaseSqoopTool: --hive-table
17/08/27 21:39:36 WARN tool.BaseSqoopTool: --hive-partition-key
17/08/27 21:39:36 WARN tool.BaseSqoopTool: --hive-partition-value
17/08/27 21:39:36 WARN tool.BaseSqoopTool: --map-column-hive
17/08/27 21:39:36 WARN tool.BaseSqoopTool: Without specifying parameter --hive-import. Please note that
17/08/27 21:39:36 WARN tool.BaseSqoopTool: those arguments will not be used in this session. Either
17/08/27 21:39:36 WARN tool.BaseSqoopTool: specify --hive-import to apply them correctly or remove them
17/08/27 21:39:36 WARN tool.BaseSqoopTool: from command line to remove this warning.
17/08/27 21:39:36 INFO tool.BaseSqoopTool: Please note that --hive-home, --hive-partition-key,
17/08/27 21:39:36 INFO tool.BaseSqoopTool: hive-partition-value and --map-column-hive options are
17/08/27 21:39:36 INFO tool.BaseSqoopTool: are also valid for HCatalog imports and exports
17/08/27 21:39:37 INFO manager.SqlManager: Using default fetchSize of 1000
17/08/27 21:39:37 INFO tool.CodeGenTool: Beginning code generation
17/08/27 21:39:38 INFO manager.OracleManager: Time zone has been set to GMT
17/08/27 21:39:38 INFO manager.SqlManager: Executing SQL statement: select * from EMP where SAL<2000 and (1 = 0)
17/08/27 21:39:38 INFO manager.SqlManager: Executing SQL statement: select * from EMP where SAL<2000 and (1 = 0)
17/08/27 21:39:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/hadoop-1.2.1
Note: /tmp/sqoop-root/compile/290db16b83f4d3d7cb898ebba4119f32/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/08/27 21:39:41 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/290db16b83f4d3d7cb898ebba4119f32/QueryResult.jar
17/08/27 21:39:42 INFO mapreduce.ImportJobBase: Beginning query import.
17/08/27 21:39:46 INFO mapred.JobClient: Running job: job_201708272044_0008
17/08/27 21:39:47 INFO mapred.JobClient: map 0% reduce 0%
17/08/27 21:40:06 INFO mapred.JobClient: map 100% reduce 0%
17/08/27 21:40:08 INFO mapred.JobClient: Job complete: job_201708272044_0008
17/08/27 21:40:08 INFO mapred.JobClient: Counters: 17
17/08/27 21:40:08 INFO mapred.JobClient: Job Counters
17/08/27 21:40:08 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=19060
17/08/27 21:40:08 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
17/08/27 21:40:08 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
17/08/27 21:40:08 INFO mapred.JobClient: Launched map tasks=1
17/08/27 21:40:08 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
17/08/27 21:40:08 INFO mapred.JobClient: File Output Format Counters
17/08/27 21:40:08 INFO mapred.JobClient: Bytes Written=0
17/08/27 21:40:08 INFO mapred.JobClient: FileSystemCounters
17/08/27 21:40:08 INFO mapred.JobClient: HDFS_BYTES_READ=87
17/08/27 21:40:08 INFO mapred.JobClient: FILE_BYTES_WRITTEN=60927
17/08/27 21:40:08 INFO mapred.JobClient: File Input Format Counters
17/08/27 21:40:08 INFO mapred.JobClient: Bytes Read=0
17/08/27 21:40:08 INFO mapred.JobClient: Map-Reduce Framework
17/08/27 21:40:08 INFO mapred.JobClient: Map input records=0
17/08/27 21:40:08 INFO mapred.JobClient: Physical memory (bytes) snapshot=66326528
17/08/27 21:40:08 INFO mapred.JobClient: Spilled Records=0
17/08/27 21:40:08 INFO mapred.JobClient: CPU time spent (ms)=1100
17/08/27 21:40:08 INFO mapred.JobClient: Total committed heap usage (bytes)=15794176
17/08/27 21:40:08 INFO mapred.JobClient: Virtual memory (bytes) snapshot=855474176
17/08/27 21:40:08 INFO mapred.JobClient: Map output records=0
17/08/27 21:40:08 INFO mapred.JobClient: SPLIT_RAW_BYTES=87
17/08/27 21:40:08 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 26.3103 seconds (0 bytes/sec)
17/08/27 21:40:08 INFO mapreduce.ImportJobBase: Retrieved 0 records.
17/08/27 21:40:09 INFO manager.OracleManager: Time zone has been set to GMT
17/08/27 21:40:09 INFO manager.SqlManager: Executing SQL statement: select * from EMP where SAL<2000 and (1 = 0)
17/08/27 21:40:09 INFO manager.SqlManager: Executing SQL statement: select * from EMP where SAL<2000 and (1 = 0)
17/08/27 21:40:09 WARN hive.TableDefWriter: Column EMPNO had to be cast to a less precise type in Hive
17/08/27 21:40:09 WARN hive.TableDefWriter: Column MGR had to be cast to a less precise type in Hive
17/08/27 21:40:09 WARN hive.TableDefWriter: Column HIREDATE had to be cast to a less precise type in Hive
17/08/27 21:40:09 WARN hive.TableDefWriter: Column SAL had to be cast to a less precise type in Hive
17/08/27 21:40:09 WARN hive.TableDefWriter: Column COMM had to be cast to a less precise type in Hive
17/08/27 21:40:09 WARN hive.TableDefWriter: Column DEPTNO had to be cast to a less precise type in Hive
17/08/27 21:40:09 INFO hive.HiveImport: Removing temporary files from import process: hdfs://localhost:9000/sqoop/emp5/_logs
17/08/27 21:40:09 INFO hive.HiveImport: Loading uploaded data into Hive
17/08/27 21:40:14 INFO hive.HiveImport:
17/08/27 21:40:14 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/root/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
17/08/27 21:40:30 INFO hive.HiveImport: OK
17/08/27 21:40:30 INFO hive.HiveImport: Time taken: 3.13 seconds
17/08/27 21:40:31 INFO hive.HiveImport: Loading data to table default.emp5
17/08/27 21:40:32 INFO hive.HiveImport: Table default.emp5 stats: [numFiles=2, numRows=0, totalSize=0, rawDataSize=0]
17/08/27 21:40:32 INFO hive.HiveImport: OK
17/08/27 21:40:32 INFO hive.HiveImport: Time taken: 1.326 seconds
17/08/27 21:40:32 INFO hive.HiveImport: Hive import complete.
17/08/27 21:40:32 INFO hive.HiveImport: Export directory is empty, removing it.
检验:
hive> show tables;
OK
emp
emp1
emp2
emp5
test1
Time taken: 0.058 seconds, Fetched: 5 row(s)
hive> select * from emp5;
OK
6、使用Sqoop将Hive中的数据导出到oracle中
[root@localhost bin]# ./sqoop export --connect jdbc:oracle:thin:@192.168.0.107:1521:orcl --username scott --password password -m 1 --table MYEMP --export-dir XXXX

七 使用sqoop导入关系型数据库中的数据到hive中

八 业务系统的大数据分析模型



原创粉丝点击