Sqoop 1.99.7 客户端操作实践

来源:互联网 发布:手机上能编程的软件 编辑:程序博客网 时间:2024/06/02 00:57

自1.99.7开始,客户端的一些操作关键字和参数发生了变化,网上很多还是早期版本,对照操作时难免一些命令或选项会报错。这里把本地实际操作过程及注意事项记录如下,作后续参考。

蓝色表示输入的内容, //后后面的为备注和解释,红色表示注意的地方。


1. bin/sqoop.sh client 启动客户端

hadoop@master:~/bigdata/sqoop$ bin/sqoop.sh client
Setting conf dir: bin/../conf
Sqoop home directory: /home/hadoop/bigdata/sqoop
Sqoop Shell: Type 'help' or '\h' for help.

sqoop:000>

sqoop:000> show version
client version:
  Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
  Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
sqoop:000>

sqoop:000> set server --host master --port 12000 --webapp sqoop            // host 即hadoop NameNode host
Server is set successfully

sqoop:000> show connector
0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable   // hadoop checknative -a 能显示多数库都是true即可,目前好像不受影响。可能是so不存在,so的格式不对如32bit, 64bit 等, refer to http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html
+------------------------+---------+------------------------------------------------------------+----------------------+
|          Name          | Version |                           Class                            | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |
| kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |
| oracle-jdbc-connector  | 1.99.7  | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |
| ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |
| hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |
| kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |
| sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |
+------------------------+---------+------------------------------------------------------------+----------------------+

因为知识产权的原因,sqoop 默认支持以上7种connector,最后一列"Supported Directions" 表示是否支持数据双向传输


2. 创建链接
sqoop:000> create link -c link1_generic_jdbc        // 以前版本中的--cid 选项已不支持
Exception has occurred during processing command
Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0003:Connector do not exists - link1_generic_jdbc
sqoop:000> create link -c generic-jdbc-connector       // -c 的参数必须是上面connector之一
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: link1_generic_jdbc            // 任意名字以表示这个link

Database connection

Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://master/mydb
Username: root
Password: ******
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry#

SQL Dialect

Identifier enclose:      //   这里一定不能什么也不输就回车,输入空格就行,否则start job时会报“GENERIC_JDBC_CONNECTOR_0016:Can't fetch schema -" 类型错误,说明如下: “Identifier enclose:指定SQL中标识符的定界符,也就是说,有的SQL标示符是一个引号:select * from "table_name",这种定界符在MySQL中是会报错的。这个属性默认值就是双引号,所以不能使用回车,必须有值将之覆盖,如没什么影响的空格。
New link was successfully created with validation status WARNING and name link1_generic_jdbc
sqoop:000>

sqoop:000> show link
+--------------------+------------------------+---------+
|        Name        |     Connector Name     | Enabled |
+--------------------+------------------------+---------+
| link1_generic_jdbc | generic-jdbc-connector | true    |
+--------------------+------------------------+---------+

sqoop:000> create link -c hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: link2_hdfs

HDFS cluster

URI: hdfs://master:9000     // hdfs 中配置的defaultFS
Conf directory:
Additional configs::
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and name link2_hdfs

sqoop:000> show link
+--------------------+------------------------+---------+
|        Name        |     Connector Name     | Enabled |
+--------------------+------------------------+---------+
| link1_generic_jdbc | generic-jdbc-connector | true    |
| link2_hdfs         | hdfs-connector         | true    |
+--------------------+------------------------+---------+

3. 创建job
sqoop:000> create job -f "link1_generic_jdbc" -t "link2_hdfs"      // from link1 to link2
Creating job for links with from name link1_generic_jdbc and to name link2_hdfs
Please fill following values to create new job object
Name: jdbcToHdfs

Database source

Schema name: mysql         // mysql 中和数据库名字一致,ie.d db name
Table name: user                // mysql table name
SQL statement:
Column names:
There are currently 0 values in the list:
element#
Partition column:
Partition column nullable:
Boundary query:

Incremental read

Check column:
Last value:

Target configuration

Override null value:
Null value:
File format:
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
  2 : PARQUET_FILE
Choose: 0
Compression codec:
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 0
Custom codec:              // 任意输入后可能会再最后一步报错,修改后继续报错,结束这次的create job 命令,再来一次即可
Output directory: /sqoop       // 导向hadoop后的目的目录
Append mode:

Throttling resources

Extractors: 1              //这两个值随意,没太了解具体含义
Loaders: 1

Classpath configuration

Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK  and name jdbcToHdfs
sqoop:000>


Exception has occurred during processing command
Exception: org.apache.sqoop.common.SqoopException Message: COMMON_0023:Unable to create new job data -  /////  已经存在相同名字的job, 改名再建一个


sqoop:000> show job
+----+---------------+---------------------------------------------+-----------------------------+---------+
| Id |     Name      |               From Connector                |        To Connector         | Enabled |
+----+---------------+---------------------------------------------+-----------------------------+---------+
| 1  | jdbcToHdfs    | link1_generic_jdbc (generic-jdbc-connector) | link2_hdfs (hdfs-connector) | true    |
+----+---------------+---------------------------------------------+-----------------------------+---------+

4. 执行任务
sqoop:000> start job -n jdbcToHdfs
Exception has occurred during processing command
Exception: org.apache.sqoop.common.SqoopException Message: GENERIC_JDBC_CONNECTOR_0001:Unable to get a connection -    // 通信失败,日志有限,打开日志开关看下具体原因

qoop:000> set option --name verbose --value true            // enable verbose mode, 以便能看到更多日志
Verbose option was changed to true


Caused by: Exception: java.lang.Throwable Message: Unknown database 'mydb'          // 前面创建link1的时候填的数据路名字不存在,要改成真实存在的DB名字后再执行
Stack trace:
     at  org.apache.sqoop.connector.jdbc.GenericJdbcExecutor (GenericJdbcExecutor.java:121)  
     at  org.apache.sqoop.connector.jdbc.GenericJdbcFromInitializer (GenericJdbcFromInitializer.java:58)  
     at  org.apache.sqoop.connector.jdbc.GenericJdbcFromInitializer (GenericJdbcFromInitializer.java:47)  
     at  org.apache.sqoop.driver.JobManager$1 (JobManager.java:520)  



sqoop:000> status job -n jdbcToHdfs
Exception has occurred during processing command
Exception: org.apache.sqoop.common.SqoopException Message: MAPREDUCE_0003:Can't get RunningJob instance -
Caused by: Exception: java.io.IOException Message: java.net.ConnectException: Call From master/10.76.9.230 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
Stack trace:
     at  org.apache.sqoop.submission.mapreduce.MapreduceSubmissionEngine (MapreduceSubmissionEngine.java:484)  
     at  org.apache.sqoop.driver.JobManager (JobManager.java:714)  


Hadoop 的historyServer没启动,
etc/hadoop/mapred-site.xml:
    <property>
    <name>mapreduce.jobhistory.address</name>
    <value>master:10020</value>
    </property>
hadoop@master:~/bigdata/hadoop/sbin$ ./mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /home/hadoop/bigdata/hadoop/logs/mapred-hadoop-historyserver-master.out

hadoop@master:~/bigdata/hadoop/sbin$ jps | grep JobHistoryServe
1717 JobHistoryServer
hadoop@master:~/bigdata/hadoop/sbin$

hadoop@master: hdfs dfs -ls /         // 可查看HDFS中是否有数据导入


sqoop:000> status job -n jdbc2hdfs_2
Submission details
Job Name: jdbc2hdfs_2
Server URL: http://master:12000/sqoop/
Created by: hadoop
Creation date: 2017-06-15 10:41:25 CST
Lastly updated by: hadoop
External ID: job_1497255721792_0002
    http://master:8088/proxy/application_1497255721792_0002/
2017-06-15 10:41:38 CST: RUNNING  - 0.00 %
sqoop:000> status job -n jdbc2hdfs_2
Submission details
Job Name: jdbc2hdfs_2
Server URL: http://master:12000/sqoop/
Created by: hadoop
Creation date: 2017-06-15 10:41:25 CST
Lastly updated by: hadoop
External ID: job_1497255721792_0002
    http://master:8088/proxy/application_1497255721792_0002/
2017-06-15 10:42:07 CST: FAILED
Exception: Job Failed with status:3

通过任务url进入浏览器,值显示失败,没显示具体原因,后续还得再查下原因。