使用sqoop1.99.6将mysql数据导入到hdfs
来源:互联网 发布:手机看cad软件 编辑:程序博客网 时间:2024/05/01 01:22
环境:sqoop-1.99.6,hadoop-2.6.0
1. 安装sqoop-1.99.6
1)下载sqoop-1.99.6,网址为http://archive.apache.org/dist/sqoop/
2)解压安装包,tar -zxvf sqoop*.tar.gz
3) 配置环境变量,vi /etc/profile,添加
export SQOOP_HOME=你的安装路径
export CATALINE_BASE=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs
退出命令行,source /etc/profile,使配置生效4)配置${SQOOP_HOME}/server/conf/catalina.properties,找到如下代码,配置为
#common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/lib/hadoop/*.jar,/usr/lib/hadoop/lib/*.jar,/usr/lib/hadoop-hdfs/*.jar,/usr/lib/hadoop-hdfs/lib/*.jar,/usr/lib/hadoop-mapreduce/*.jar,/usr/lib/hadoop-mapreduce/lib/*.jar,/usr/lib/hadoop-yarn/*.jar,/usr/lib/hadoop-yarn/lib/*.jar,/usr/lib/hive/lib/*.jar
#
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/common/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/common/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/hdfs/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/hdfs/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/mapreduce/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/mapreduce/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/tools/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/tools/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/yarn/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/yarn/lib/*.jar,/usr/soft/hadoop-2.6.0/share/hadoop/httpfs/tomcat/lib/*.jar,/usr/soft/hive-1.2.1/lib/*.jar
5)配置${SQOOP_HOME}/server/conf/sqoop.properties文件,找到如下代码,配置为:
# Hadoop configuration directory
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/soft/hadoop-2.6.0/etc/hadoop/
6) 把mysql的驱动包mysql-connector-java-5.1.16-bin.jar 放到${SQOOP_HOME}/server/lib/里
7) 测试是否安装成功,先启动hadoop,再启动sqoop。
2. 在sqoop安装路径下执行命令,bin/sqoop.sh server start,然后执行bin/sqoop.sh client启动客户端,并创建所需的link
[root@Master sqoop-1.99.6]# bin/sqoop.sh clientSqoop home directory: /usr/soft/sqoop-1.99.6Sqoop Shell: Type 'help' or '\h' for help.1)执行命令show connector查看connector
sqoop:000> show connector0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable+----+------------------------+---------+------------------------------------------------------+----------------------+| Id | Name | Version | Class | Supported Directions |+----+------------------------+---------+------------------------------------------------------+----------------------+| 1 | kite-connector | 1.99.6 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO || 2 | kafka-connector | 1.99.6 | org.apache.sqoop.connector.kafka.KafkaConnector | TO || 3 | hdfs-connector | 1.99.6 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO || 4 | generic-jdbc-connector | 1.99.6 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |+----+------------------------+---------+------------------------------------------------------+----------------------+从代码中可以看出有4个connector,我们这里是从mysql导入数据到hdfs,所以需要使用Id为3和4的connector
2)首先创建 generic-jdbc-connector类型的connector,注释在代码中
sqoop:000> create link -c 4Creating link for connector with id 4Please fill following values to create new link objectName: jdbclink //用于标识link的Link configurationJDBC Driver Class: com.mysql.jdbc.Driver //指定jdbc启动时所需要加载的driver类JDBC Connection String: jdbc:mysql://Master:3306/test //这里Master是localhost就是你的主机名,3306端口号是固定的,test是你所用的数据库名称Username: root //链接数据库的用户名Password: ****** //用户密码JDBC Connection Properties: There are currently 0 values in the map:entry# protocol=tcpThere are currently 1 values in the map:protocol = tcpentry# New link was successfully created with validation status OK and persistent id 8查看link是否创建成功
sqoop:000> show link+----+----------+--------------+------------------------+---------+| Id | Name | Connector Id | Connector Name | Enabled |+----+----------+--------------+------------------------+---------+| 8 | jdbclink | 4 | generic-jdbc-connector | true |+----+----------+--------------+------------------------+---------+
3)接着创建hdfs的link
sqoop:000> create link -c 3Creating link for connector with id 3Please fill following values to create new link objectName: hdfslinkLink configurationHDFS URI: hdfs://Master:8020 //就是hadoop中配置的hdfs-site.xml中属性fs.defaultFS的值Hadoop conf directory: New link was successfully created with validation status OK and persistent id 9sqoop:000> show link+----+----------+--------------+------------------------+---------+| Id | Name | Connector Id | Connector Name | Enabled |+----+----------+--------------+------------------------+---------+| 8 | jdbclink | 4 | generic-jdbc-connector | true || 9 | hdfslink | 3 | hdfs-connector | true |+----+----------+--------------+------------------------+---------+
3. 创建传输任务job,create job -f 8 -t 9,-f即from,也就是数据源的位置,-t即to,也就是数据源要存储的目的地位置。
sqoop:000> create job -f 8 -t 9Creating job for links with from id 8 and to id 9Please fill following values to create new job objectName: job1 //标识符From database configurationSchema name: test //数据库名字Table name: my_user //表名字Table SQL statement: //SQL查询语句Table column names: Partition column name: Null value allowed for the partition column: Boundary query: Incremental readCheck column: Last value: To HDFS configurationOverride null value: Null value: Output format: 0 : TEXT_FILE 1 : SEQUENCE_FILEChoose: 0 //数据在hdfs上的存储格式Compression format: 0 : NONE 1 : DEFAULT 2 : DEFLATE 3 : GZIP 4 : BZIP2 5 : LZO 6 : LZ4 7 : SNAPPY 8 : CUSTOMChoose: 0 //指定压缩算法Custom compression format: Output directory: hdfs://Master:8020/sqoop2/my_user_import //存储在hdfs文件系统的路径Append mode: Throttling resourcesExtractors: Loaders: New job was successfully created with validation status OK and persistent id 3sqoop:000> show job+----+------+----------------+--------------+---------+| Id | Name | From Connector | To Connector | Enabled |+----+------+----------------+--------------+---------+| 3 | job1 | 4 | 3 | true |+----+------+----------------+--------------+---------+查看job
sqoop:000> show job+----+------+----------------+--------------+---------+| Id | Name | From Connector | To Connector | Enabled |+----+------+----------------+--------------+---------+| 3 | job1 | 4 | 3 | true |+----+------+----------------+--------------+---------+启动job任务
sqoop:000> start job --jid 3Submission detailsJob ID: 3Server URL: http://localhost:12000/sqoop/Created by: rootCreation date: 2017-08-28 14:27:26 CSTLastly updated by: rootExternal ID: job_1502944428192_0003http://10.226.118.24:8888/proxy/application_1502944428192_0003/2017-08-28 14:27:26 CST: BOOTING - Progress is not available通过设置可查看具体的出错信息
set option --name verbose --value true
再次执行
sqoop:000> status job --jid 3Submission detailsJob ID: 3Server URL: http://localhost:12000/sqoop/Created by: rootCreation date: 2017-08-28 14:27:26 CSTLastly updated by: rootExternal ID: job_1502944428192_0003http://10.226.118.24:8888/proxy/application_1502944428192_0003/2017-08-28 14:27:56 CST: SUCCEEDED Counters:org.apache.hadoop.mapreduce.JobCounterSLOTS_MILLIS_MAPS: 2241TOTAL_LAUNCHED_UBERTASKS: 6NUM_UBER_SUBMAPS: 6MB_MILLIS_MAPS: 2294784TOTAL_LAUNCHED_MAPS: 6MILLIS_MAPS: 2241VCORES_MILLIS_MAPS: 2241OTHER_LOCAL_MAPS: 6org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounterBYTES_READ: 0org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounterBYTES_WRITTEN: 0org.apache.hadoop.mapreduce.TaskCounterMAP_INPUT_RECORDS: 0MERGED_MAP_OUTPUTS: 0PHYSICAL_MEMORY_BYTES: 868519936SPILLED_RECORDS: 0FAILED_SHUFFLE: 0CPU_MILLISECONDS: 1640COMMITTED_HEAP_BYTES: 159473664VIRTUAL_MEMORY_BYTES: 10597224448MAP_OUTPUT_RECORDS: 7SPLIT_RAW_BYTES: 655GC_TIME_MILLIS: 726org.apache.hadoop.mapreduce.FileSystemCounterFILE_READ_OPS: 0FILE_WRITE_OPS: 0FILE_BYTES_READ: 0FILE_LARGE_READ_OPS: 0HDFS_BYTES_READ: 2500FILE_BYTES_WRITTEN: 0HDFS_LARGE_READ_OPS: 0HDFS_BYTES_WRITTEN: 1629714HDFS_READ_OPS: 267HDFS_WRITE_OPS: 33org.apache.sqoop.submission.counter.SqoopCountersROWS_WRITTEN: 7ROWS_READ: 7Job executed successfully说明任务完成
4. 在hdfs上查看数据是否导入成功
在浏览器输入Master:50070
然后进入my_user_import,显示如下
说明导入数据成功。
查看导入的数据,在${HADOOP_HOME}下执行如下命令,就可以看到对应的存储内容。
bin/hdfs dfs -text /sqoop2/my_user_import/01dde47f-e009-4cf1-9170-e0a3f76d0466.txt
- 使用sqoop1.99.6将mysql数据导入到hdfs
- Sqoop1.4.4使用增量导入模式将MySQL数据库中数据导入到HDFS中
- 使用Sqoop1.4.4将MySQL数据库表中数据导入到HDFS中
- Sqoop1.4.4使用SQL语句形式将MySQL数据库表中数据导入到HDFS中
- [bigdata-003]在cdh 5.7下 用sqoop1将mysql数据库数据导入到hdfs的方式
- sqoop1.4.5将mysql中的数据导入到HBase中
- 使用sqoop将mysql数据导入到hdfs
- 使用sqoop1将hive导入mysql
- 使用sqoop1,将sqlserver数据导入hive
- Sqoop1.4.4将文件数据集从HDFS中导出到MySQL数据库表中
- Sqoop1.4.4将MySQL中数据导入到Hive表中
- Sqoop1.4.4将MySQL数据库表中数据导入到HBase表中
- 用sqoop将mysql数据导入到hdfs中
- 利用sqoop1将mysql数据导入至hive多分区
- OOzie调度sqoop1 Action 从mysql导入数据到hive
- 使用hdfs-slurper 将数据导入hdfs
- 使用Sqoop将HDFS中数据导入MYSQL中
- 1.4 使用Sqoop从MySQL数据库导入数据到HDFS
- jQuery 选择器(:nth-child(n))详解
- 【废弃存档】检测线路连接
- POJ-2046---Gap (bfs+hash)
- C++编程规范(不断更新)
- 北京台历印刷厂家
- 使用sqoop1.99.6将mysql数据导入到hdfs
- 批量替换mysql数据库字段里的字符串
- 使用Echarts的时候,使用AJAX异步记载数据的时候,出现了 series chart type has not been defined 错误
- Ubuntu 14.04 LTS 修改服务器 ssh 端口号
- 缓冲字符copy
- @Transactional(rollbackFor = Exception.class)作用
- c++ 通过socket(网络套接字)传输图片(客户端)
- Material Design整理(五)——NavigationView
- bitmap options.outWidth为-1