使用sqoop1.99.6将mysql数据导入到hdfs

来源:互联网 发布:手机看cad软件 编辑:程序博客网 时间:2024/05/01 01:22

环境:sqoop-1.99.6,hadoop-2.6.0

1. 安装sqoop-1.99.6

1)下载sqoop-1.99.6,网址为http://archive.apache.org/dist/sqoop/

2)解压安装包,tar -zxvf sqoop*.tar.gz

3)  配置环境变量,vi /etc/profile,添加

export SQOOP_HOME=你的安装路径

export CATALINE_BASE=$SQOOP_HOME/server

export LOGDIR=$SQOOP_HOME/logs

退出命令行,source /etc/profile,使配置生效

4)配置${SQOOP_HOME}/server/conf/catalina.properties,找到如下代码,配置为

#common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/lib/hadoop/*.jar,/usr/lib/hadoop/lib/*.jar,/usr/lib/hadoop-hdfs/*.jar,/usr/lib/hadoop-hdfs/lib/*.jar,/usr/lib/hadoop-mapreduce/*.jar,/usr/lib/hadoop-mapreduce/lib/*.jar,/usr/lib/hadoop-yarn/*.jar,/usr/lib/hadoop-yarn/lib/*.jar,/usr/lib/hive/lib/*.jar
#
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/common/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/common/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/hdfs/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/hdfs/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/mapreduce/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/mapreduce/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/tools/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/tools/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/yarn/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/yarn/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/httpfs/tomcat/lib/*.jar,/usr/soft/hive-1.2.1/lib/*.jar

5)配置${SQOOP_HOME}/server/conf/sqoop.properties文件,找到如下代码,配置为:

# Hadoop configuration directory

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/soft/hadoop-2.6.0/etc/hadoop/

6) 把mysql的驱动包mysql-connector-java-5.1.16-bin.jar 放到${SQOOP_HOME}/server/lib/里

7) 测试是否安装成功,先启动hadoop,再启动sqoop。

2. 在sqoop安装路径下执行命令,bin/sqoop.sh server start,然后执行bin/sqoop.sh client启动客户端,并创建所需的link

[root@Master sqoop-1.99.6]# bin/sqoop.sh clientSqoop home directory: /usr/soft/sqoop-1.99.6Sqoop Shell: Type 'help' or '\h' for help.
1)执行命令show connector查看connector

sqoop:000> show connector0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable+----+------------------------+---------+------------------------------------------------------+----------------------+| Id |          Name          | Version |                        Class                         | Supported Directions |+----+------------------------+---------+------------------------------------------------------+----------------------+| 1  | kite-connector         | 1.99.6  | org.apache.sqoop.connector.kite.KiteConnector        | FROM/TO              || 2  | kafka-connector        | 1.99.6  | org.apache.sqoop.connector.kafka.KafkaConnector      | TO                   || 3  | hdfs-connector         | 1.99.6  | org.apache.sqoop.connector.hdfs.HdfsConnector        | FROM/TO              || 4  | generic-jdbc-connector | 1.99.6  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO              |+----+------------------------+---------+------------------------------------------------------+----------------------+
从代码中可以看出有4个connector,我们这里是从mysql导入数据到hdfs,所以需要使用Id为3和4的connector

2)首先创建 generic-jdbc-connector类型的connector,注释在代码中

sqoop:000> create link -c 4Creating link for connector with id 4Please fill following values to create new link objectName: jdbclink  //用于标识link的Link configurationJDBC Driver Class: com.mysql.jdbc.Driver  //指定jdbc启动时所需要加载的driver类JDBC Connection String: jdbc:mysql://Master:3306/test  //这里Master是localhost就是你的主机名,3306端口号是固定的,test是你所用的数据库名称Username: root  //链接数据库的用户名Password: ******  //用户密码JDBC Connection Properties: There are currently 0 values in the map:entry# protocol=tcpThere are currently 1 values in the map:protocol = tcpentry# New link was successfully created with validation status OK and persistent id 8
查看link是否创建成功

sqoop:000> show link+----+----------+--------------+------------------------+---------+| Id |   Name   | Connector Id |     Connector Name     | Enabled |+----+----------+--------------+------------------------+---------+| 8  | jdbclink | 4            | generic-jdbc-connector | true    |+----+----------+--------------+------------------------+---------+

3)接着创建hdfs的link

sqoop:000> create link -c 3Creating link for connector with id 3Please fill following values to create new link objectName: hdfslinkLink configurationHDFS URI: hdfs://Master:8020  //就是hadoop中配置的hdfs-site.xml中属性fs.defaultFS的值Hadoop conf directory: New link was successfully created with validation status OK and persistent id 9sqoop:000> show link+----+----------+--------------+------------------------+---------+| Id |   Name   | Connector Id |     Connector Name     | Enabled |+----+----------+--------------+------------------------+---------+| 8  | jdbclink | 4            | generic-jdbc-connector | true    || 9  | hdfslink | 3            | hdfs-connector         | true    |+----+----------+--------------+------------------------+---------+

3. 创建传输任务job,create job -f 8 -t 9,-f即from,也就是数据源的位置,-t即to,也就是数据源要存储的目的地位置。

sqoop:000> create job -f 8 -t 9Creating job for links with from id 8 and to id 9Please fill following values to create new job objectName: job1 //标识符From database configurationSchema name: test  //数据库名字Table name: my_user //表名字Table SQL statement:  //SQL查询语句Table column names: Partition column name: Null value allowed for the partition column: Boundary query: Incremental readCheck column: Last value: To HDFS configurationOverride null value: Null value: Output format:   0 : TEXT_FILE  1 : SEQUENCE_FILEChoose: 0  //数据在hdfs上的存储格式Compression format:   0 : NONE  1 : DEFAULT  2 : DEFLATE  3 : GZIP  4 : BZIP2  5 : LZO  6 : LZ4  7 : SNAPPY  8 : CUSTOMChoose: 0 //指定压缩算法Custom compression format: Output directory: hdfs://Master:8020/sqoop2/my_user_import //存储在hdfs文件系统的路径Append mode: Throttling resourcesExtractors: Loaders: New job was successfully created with validation status OK  and persistent id 3sqoop:000> show job+----+------+----------------+--------------+---------+| Id | Name | From Connector | To Connector | Enabled |+----+------+----------------+--------------+---------+| 3  | job1 | 4              | 3            | true    |+----+------+----------------+--------------+---------+
查看job

sqoop:000> show job+----+------+----------------+--------------+---------+| Id | Name | From Connector | To Connector | Enabled |+----+------+----------------+--------------+---------+| 3  | job1 | 4              | 3            | true    |+----+------+----------------+--------------+---------+
启动job任务

sqoop:000> start job --jid 3Submission detailsJob ID: 3Server URL: http://localhost:12000/sqoop/Created by: rootCreation date: 2017-08-28 14:27:26 CSTLastly updated by: rootExternal ID: job_1502944428192_0003http://10.226.118.24:8888/proxy/application_1502944428192_0003/2017-08-28 14:27:26 CST: BOOTING  - Progress is not available
通过设置可查看具体的出错信息

set option --name verbose --value true

再次执行

sqoop:000> status job --jid 3Submission detailsJob ID: 3Server URL: http://localhost:12000/sqoop/Created by: rootCreation date: 2017-08-28 14:27:26 CSTLastly updated by: rootExternal ID: job_1502944428192_0003http://10.226.118.24:8888/proxy/application_1502944428192_0003/2017-08-28 14:27:56 CST: SUCCEEDED Counters:org.apache.hadoop.mapreduce.JobCounterSLOTS_MILLIS_MAPS: 2241TOTAL_LAUNCHED_UBERTASKS: 6NUM_UBER_SUBMAPS: 6MB_MILLIS_MAPS: 2294784TOTAL_LAUNCHED_MAPS: 6MILLIS_MAPS: 2241VCORES_MILLIS_MAPS: 2241OTHER_LOCAL_MAPS: 6org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounterBYTES_READ: 0org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounterBYTES_WRITTEN: 0org.apache.hadoop.mapreduce.TaskCounterMAP_INPUT_RECORDS: 0MERGED_MAP_OUTPUTS: 0PHYSICAL_MEMORY_BYTES: 868519936SPILLED_RECORDS: 0FAILED_SHUFFLE: 0CPU_MILLISECONDS: 1640COMMITTED_HEAP_BYTES: 159473664VIRTUAL_MEMORY_BYTES: 10597224448MAP_OUTPUT_RECORDS: 7SPLIT_RAW_BYTES: 655GC_TIME_MILLIS: 726org.apache.hadoop.mapreduce.FileSystemCounterFILE_READ_OPS: 0FILE_WRITE_OPS: 0FILE_BYTES_READ: 0FILE_LARGE_READ_OPS: 0HDFS_BYTES_READ: 2500FILE_BYTES_WRITTEN: 0HDFS_LARGE_READ_OPS: 0HDFS_BYTES_WRITTEN: 1629714HDFS_READ_OPS: 267HDFS_WRITE_OPS: 33org.apache.sqoop.submission.counter.SqoopCountersROWS_WRITTEN: 7ROWS_READ: 7Job executed successfully
说明任务完成

4. 在hdfs上查看数据是否导入成功

在浏览器输入Master:50070




然后进入my_user_import,显示如下



说明导入数据成功。

查看导入的数据,在${HADOOP_HOME}下执行如下命令,就可以看到对应的存储内容。

bin/hdfs dfs -text /sqoop2/my_user_import/01dde47f-e009-4cf1-9170-e0a3f76d0466.txt


阅读全文
0 0
原创粉丝点击