Sqoop1 从Oracle往Hive迁移数据

来源:互联网 发布:虚拟货币网站源码php 编辑:程序博客网 时间:2024/06/05 23:38

本篇幅讲述如何利用CDH Hadoop版本上面的Sqoop1从Oracle往Hive迁移数据,例子比较简单,仅供参考。

1 列出CDH目录下面的sqoop所有相关命令

[root@n12 bin]# pwd/opt/cloudera/parcels/CDH/bin[root@n12 bin]# ll sqoop*-rwxr-xr-x 1 root root  918 Jul 23  2016 sqoop-rwxr-xr-x 1 root root 1486 Jul 23  2016 sqoop2-rwxr-xr-x 1 root root 1795 Jul 23  2016 sqoop2-server-rwxr-xr-x 1 root root 1842 Jul 23  2016 sqoop2-tool-rwxr-xr-x 1 root root  926 Jul 23  2016 sqoop-codegen-rwxr-xr-x 1 root root  936 Jul 23  2016 sqoop-create-hive-table-rwxr-xr-x 1 root root  923 Jul 23  2016 sqoop-eval-rwxr-xr-x 1 root root  925 Jul 23  2016 sqoop-export-rwxr-xr-x 1 root root  923 Jul 23  2016 sqoop-help-rwxr-xr-x 1 root root  925 Jul 23  2016 sqoop-import-rwxr-xr-x 1 root root  936 Jul 23  2016 sqoop-import-all-tables-rwxr-xr-x 1 root root  922 Jul 23  2016 sqoop-job-rwxr-xr-x 1 root root  933 Jul 23  2016 sqoop-list-databases-rwxr-xr-x 1 root root  930 Jul 23  2016 sqoop-list-tables-rwxr-xr-x 1 root root  924 Jul 23  2016 sqoop-merge-rwxr-xr-x 1 root root  928 Jul 23  2016 sqoop-metastore-rwxr-xr-x 1 root root  926 Jul 23  2016 sqoop-version[root@n12 bin]# sqoop helpWarning: /opt/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.17/02/24 14:29:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.2usage: sqoop COMMAND [ARGS]Available commands:  codegen            Generate code to interact with database records  create-hive-table  Import a table definition into Hive  eval               Evaluate a SQL statement and display the results  export             Export an HDFS directory to a database table  help               List available commands  import             Import a table from a database to HDFS  import-all-tables  Import tables from a database to HDFS  import-mainframe   Import datasets from a mainframe server to HDFS  job                Work with saved jobs  list-databases     List available databases on a server  list-tables        List available tables in a database  merge              Merge results of incremental imports  metastore          Run a standalone Sqoop metastore  version            Display version informationSee 'sqoop help COMMAND' for information on a specific command.

2 根据上述结果可知,运行sqoop命令,可以使用”sqoop tool-name [tool-arguments]”运行,也可以使用对应的Aliaes,如”sqoop import …”可以直接运行”sqoop-import”命令

3 sqoop支持从多种RDBMS与Hadoop平台的数据导入导出,典型的有Oracle,Mysql,Postgresql,具体请参考官网:http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_supported_databases

这里写图片描述

4 这里以从Oracle往Hive抽取数据为例,首先需要准备Oracle JDBC驱动ojdbc6.jar,Jar请自行从网上下载,这里不再赘述;将下载的ojdbc6.jar放到$SQOOP_HOME/lib目录下,如下

/opt/cloudera/parcels/CDH/lib/sqoop/lib[root@n12 lib]# ll ojdbc6.jar-rwxrwxrwx 1 root root 2739670 Feb 24 14:50 ojdbc6.jar

5 执行”sqoop list-tables”命令列出Oracle中的表,

[root@n12 bin]# sqoop-list-tables --driver oracle.jdbc.OracleDriver --connect jdbc:oracle:thin:@10.10.11.16:1521/esgyn --username system --password 12345Warning: /opt/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.17/02/24 14:58:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.217/02/24 14:58:53 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.17/02/24 14:58:53 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.17/02/24 14:58:53 INFO manager.SqlManager: Using default fetchSize of 1000SYS_IOT_OVER_70800WWV_COLUMN_EXCEPTIONSWWV_FLOWSWWV_FLOWS_RESERVEDWWV_FLOW_ACTIVITY_LOG1$WWV_FLOW_ACTIVITY_LOG2$WWV_FLOW_ACTIVITY_LOG_NUMBER$WWV_FLOW_ALTERNATE_CONFIGWWV_FLOW_ALT_CONFIG_DETAILWWV_FLOW_ALT_CONFIG_PICKWWV_FLOW_APPLICATION_GROUPSWWV_FLOW_APP_BUILD_PREFWWV_FLOW_APP_COMMENTSWWV_FLOW_BANNERWWV_FLOW_BUILDER_AUDIT_TRAILWWV_FLOW_BUTTON_TEMPLATESWWV_FLOW_CALS...

6 执行”sqoop import”命令抽取Oracle中的一个表到Hive,如下

[root@n12 bin]# sqoop-import --connect jdbc:oracle:thin:@10.10.11.16:1521/esgyn --username system --password 12345 --table TEST_GROUPBY --split-by A --hive-import --create-hive-table --hive-table test_groupby --hive-overwrite --null-string '\\N' --null-non-string '\\N' --hive-drop-import-delims --verboseWarning: /opt/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.17/02/24 15:08:46 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.217/02/24 15:08:46 DEBUG tool.BaseSqoopTool: Enabled debug logging....Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/jars/hive-common-1.1.0-cdh5.7.2.jar!/hive-log4j.propertiesOKTime taken: 2.409 secondsLoading data to table default.test_groupbychgrp: changing ownership of 'hdfs://n11.trafodion.local:8020/user/hive/warehouse/test_groupby/part-m-00000': User does not belong to hivechgrp: changing ownership of 'hdfs://n11.trafodion.local:8020/user/hive/warehouse/test_groupby/part-m-00001': User does not belong to hivechgrp: changing ownership of 'hdfs://n11.trafodion.local:8020/user/hive/warehouse/test_groupby/part-m-00002': User does not belong to hivechgrp: changing ownership of 'hdfs://n11.trafodion.local:8020/user/hive/warehouse/test_groupby/part-m-00003': User does not belong to hiveTable default.test_groupby stats: [numFiles=4, numRows=0, totalSize=60, rawDataSize=0]OKTime taken: 0.886 seconds

7 在Hive CLI中检查是否导入成功

hive> show tables;test_groupbyTime taken: 1.726 seconds, Fetched: 11 row(s)hive> select * from test_groupby;OK1.0     A               B               NULL    NULL2.0     A               C               NULL    NULLTime taken: 0.755 seconds, Fetched: 2 row(s)

8 除了可以直接使用”sqoop-import”这种命令直接运行sqoop命令,也可以使用”sqoop-job”将一个sqoop命令保存为一个job,这样可以在需要的时间再调用它,而且可以重复执行。使用–create创建job,–list查看job,–delete删除job,–exec执行job

[root@n12 bin]# sqoop-job --create job1 -- import --connect jdbc:oracle:thin:@10.10.11.16:1521/esgyn --username system --password 12345 --table TEST_GROUPBY --split-by A --hive-import --create-hive-table --hive-table test_groupby --hive-overwrite --null-string '\\N' --null-non-string '\\N' --hive-drop-import-delims --verboseWarning: /opt/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.17/02/24 17:00:43 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.217/02/24 17:00:44 DEBUG tool.BaseSqoopTool: Enabled debug logging....[root@n12 bin]# sqoop-job --listWarning: /opt/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.17/02/24 17:02:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.2Available jobs:  job1[root@n12 bin]# sqoop-job --delete job1Warning: /opt/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.Please set $ACCUMULO_HOME to the root of your Accumulo installation.17/02/24 17:03:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.2
1 0
原创粉丝点击