RDBMS和HDFS, HIVE, HBASE的迁移工具Sqoop
来源:互联网 发布:淘宝微商哪个赚钱 编辑:程序博客网 时间:2024/05/29 09:38
Sqoop 简介
Apache Sqoop 项目旨在协助 RDBMS 与 Hadoop 之间进行高效的大数据交流。用户可以在 Sqoop 的帮助下,轻松地把关系型数据库的数据导入到 Hadoop 与其相关的系统 ( 如 HBase 和 Hive) 中;同时也可以把数据从 Hadoop 系统里抽取并导出到关系型数据库里。除了这些主要的功能外,Sqoop 也提供了一些诸如查看数据库表等实用的小工具。
Sqoop 支持的数据库
理论上,Sqoop 支持任何一款支持 JDBC 规范的数据库,如 DB2、MySQL 等。在使用 Sqoop 连接关系型数据库前,首先需要把相关的 JDBC 驱动拷贝到 $SQOOP_HOME/lib 文件夹下,然后在“connect”参数后指定好数据库连接的 url,如“--connect jdbc:db2://localhost:50000/SAMPLE”。 对于 DB2 数据库来说,Sqoop 目前支持 DB2 的绝大多数数据类型,而且 Sqoop 的大多数工具也能在 DB2 上较好地运行。
Sqoop 支持的文件类型
Available commands: codegen Generate code to interact with database records create-hive-table Import a table definition into Hive eval Evaluate a SQL statement and display the results export Export an HDFS directory to a database table help List available commands import Import a table from a database to HDFS import-all-tables Import tables from a database to HDFS list-databases List available databases on a server list-tables List available tables in a database version Display version informationSee 'sqoop help COMMAND' for information on a specific command.
List RDBMS Tables;
./sqoop list-tables --connect jdbc:mysql://127.0.0.1:3306/feixu --username root --password root123;
hadoop@feixu-master1:/usr/local/sqoop/bin$ ./sqoop import --connect jdbc:mysql://192.168.56.102:3306/feixu --username root --password root123 --table RATES_BK;
Warning: /usr/lib/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
13/12/20 05:03:51 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
13/12/20 05:03:52 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
13/12/20 05:03:52 INFO tool.CodeGenTool: Beginning code generation
13/12/20 05:03:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `RATES_BK` AS t LIMIT 1
13/12/20 05:03:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `RATES_BK` AS t LIMIT 1
13/12/20 05:03:52 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-hadoop/compile/e8fc42fcc6b9c207b7aaea879c43be95/RATES_BK.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/12/20 05:03:53 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e8fc42fcc6b9c207b7aaea879c43be95/RATES_BK.jar
13/12/20 05:03:53 WARN manager.MySQLManager: It looks like you are importing from mysql.
13/12/20 05:03:53 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
13/12/20 05:03:53 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
13/12/20 05:03:53 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
13/12/20 05:03:53 WARN manager.CatalogQueryManager: The table RATES_BK contains a multi-column primary key. Sqoop will default to the column USER_ID only for this job.
13/12/20 05:03:53 WARN manager.CatalogQueryManager: The table RATES_BK contains a multi-column primary key. Sqoop will default to the column USER_ID only for this job.
13/12/20 05:03:53 INFO mapreduce.ImportJobBase: Beginning import of RATES_BK
13/12/20 05:03:56 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`USER_ID`), MAX(`USER_ID`) FROM `RATES_BK`
13/12/20 05:03:57 INFO mapred.JobClient: Running job: job_201312200120_0010
13/12/20 05:03:58 INFO mapred.JobClient: map 0% reduce 0%
13/12/20 05:04:13 INFO mapred.JobClient: map 25% reduce 0%
13/12/20 05:04:14 INFO mapred.JobClient: map 50% reduce 0%
13/12/20 05:04:15 INFO mapred.JobClient: map 100% reduce 0%
13/12/20 05:04:17 INFO mapred.JobClient: Job complete: job_201312200120_0010
13/12/20 05:04:17 INFO mapred.JobClient: Counters: 18
13/12/20 05:04:17 INFO mapred.JobClient: Job Counters
13/12/20 05:04:17 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=36108
13/12/20 05:04:17 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/12/20 05:04:17 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/12/20 05:04:17 INFO mapred.JobClient: Launched map tasks=4
13/12/20 05:04:17 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/12/20 05:04:17 INFO mapred.JobClient: File Output Format Counters
13/12/20 05:04:17 INFO mapred.JobClient: Bytes Written=216525
13/12/20 05:04:17 INFO mapred.JobClient: FileSystemCounters
13/12/20 05:04:17 INFO mapred.JobClient: HDFS_BYTES_READ=440
13/12/20 05:04:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=268158
13/12/20 05:04:17 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=216525
13/12/20 05:04:17 INFO mapred.JobClient: File Input Format Counters
13/12/20 05:04:17 INFO mapred.JobClient: Bytes Read=0
13/12/20 05:04:17 INFO mapred.JobClient: Map-Reduce Framework
13/12/20 05:04:17 INFO mapred.JobClient: Map input records=10000
13/12/20 05:04:17 INFO mapred.JobClient: Physical memory (bytes) snapshot=152514560
13/12/20 05:04:17 INFO mapred.JobClient: Spilled Records=0
13/12/20 05:04:17 INFO mapred.JobClient: CPU time spent (ms)=3350
13/12/20 05:04:17 INFO mapred.JobClient: Total committed heap usage (bytes)=37519360
13/12/20 05:04:17 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1391800320
13/12/20 05:04:17 INFO mapred.JobClient: Map output records=10000
13/12/20 05:04:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=440
13/12/20 05:04:17 INFO mapreduce.ImportJobBase: Transferred 211.4502 KB in 23.3888 seconds (9.0407 KB/sec)
13/12/20 05:04:17 INFO mapreduce.ImportJobBase: Retrieved 10000 records.
- RDBMS和HDFS, HIVE, HBASE的迁移工具Sqoop
- sqoop:mysql和Hbase/Hive/Hdfs之间相互导入数据
- sqoop向hdfs,hive,hbase导入数据
- mysql 导入hdfs、hive、hbase sqoop使用方法
- MySQL向Hive/HBase的迁移工具
- MySQL向Hive/HBase的迁移工具
- MySQL向Hive/HBase的迁移工具
- 大数据基础(二)hadoop, mave, hbase, hive, sqoop在ubuntu 14.04.04下的安装和sqoop与hdfs,hive,mysql导入导出
- Hadoop数据工具sqoop,导入HDFS,HIVE,HBASE,导出到oracle
- Sqoop工具实现,hive、Hbase、HDFS与mysql、oracle数据导入
- 使用Sqoop在HDFS和RDBMS之间导数据
- hbase和RDBMS的区别
- CDH4安装实践HDFS、HBase、Zookeeper、Hive、Oozie、Sqoop
- Sqoop迁移工具的使用
- HBase与RDBMS&HDFS对比
- sqoop export工具把HDFS中的数据导入到rdbms系统中,实现方式有三种
- Sqoop数据迁移,工作机制,sqoop安装(配置),Sqoop的数据导入,导入表数据到HDFS,导入关系表到HIVE,导入到HDFS指定目录,导入表数据子集,按需导入,增量导入,sqoop数据导出
- Mysql到HBase数据迁移-Sqoop工具方法
- 大数据云计算的利器hadoop介绍
- Hadoop的脚本语言Pig应用
- Gnuplot图形展示hadoop处理结果
- 一、ReactiveCocoa(RAC)配置
- Hadoop数据仓库hive的应用
- RDBMS和HDFS, HIVE, HBASE的迁移工具Sqoop
- hadoop基础总结
- virtualbox桥接网络配置--CentOS
- 企业级hadoop集群选型配置
- hadoop1.2.1+zookeeper3.4.6+hbase0.94集群环境搭建
- BZOJ 1189: [HNOI2007]紧急疏散evacuate
- Hadoop性能参数调优
- (精)字符集GBK和UTF8的区别说明
- webView混淆的时候的注意