HBase数据的导入导出

来源:互联网 发布:excel调取数据库 编辑:程序博客网 时间:2024/05/31 19:31
HBase数据的导入导出

1、导出:
hbase org.apache.hadoop.hbase.mapreduce.Driver export 表名 导出存放路径

其中数据文件位置可为本地文件目录,也可以分布式文件系统hdfs的路径。
当其为前者时,直接指定即可,也可以加前缀file:///
而当其为后者时,必须明确指明hdfs的路径,例如hdfs://192.168.1.200:9000/path


2、导入:
hbase org.apache.hadoop.hbase.mapreduce.Driver import 表名 要导入的文件路径
同上,其中数据文件位置可为本地文件目录,也可以分布式文件系统hdfs的路径。
另外,该接口类还提供了一些其它的方法,例如表与表之间的数据拷贝,导入tsv文件等,可回车键查看



Import操作必须是使用Export出的数据才可以,要不然会报错 not a SequenceFile。




3、本文用的示例:
bin/hbase org.apache.hadoop.hbase.mapreduce.Driver export waln_log /usr/local/waln_log
bin/hbase org.apache.hadoop.hbase.mapreduce.Driver import waln_log1 hdfs://192.168.1.200:9000/usr/local/waln_log/part-m-00000



3.1、首先查看HBase的表waln_log数据,一会就把这个表数据导出来:
hbase(main):009:0> scan 'waln_log'
ROW                                COLUMN+CELL
 row1                              column=cf:age, timestamp=1432740300560, value=22
 row1                              column=cf:city, timestamp=1432740308281, value=shanghai
 row1                              column=cf:name, timestamp=1432740263412, value=zhangsan
 row2                              column=cf:name, timestamp=1432740296373, value=lisi
2 row(s) in 12.0670 seconds


hbase(main):010:0> desc 'waln_log'
Table waln_log is ENABLED
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE',
MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'
}
1 row(s) in 1.1220 seconds


hbase(main):011:0>




3.2、开始执行:
bin/hbase org.apache.hadoop.hbase.mapreduce.Driver export waln_log /usr/local/waln_log




执行完之后,生成的HDFS目录:
[root@baozi hbase-0.99.2]# hdfs dfs -ls /usr/local/waln_log
Found 2 items
-rw-r--r--   1 root supergroup          0 2015-05-27 23:34 /usr/local/waln_log/_SUCCESS
-rw-r--r--   1 root supergroup        289 2015-05-27 23:33 /usr/local/waln_log/part-m-00000





执行部分过程,会跑一个MR:
2015-05-27 23:27:56,261 INFO  [main] zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/hadoop/lib/native
2015-05-27 23:27:56,261 INFO  [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2015-05-27 23:27:56,261 INFO  [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2015-05-27 23:27:56,262 INFO  [main] zookeeper.ZooKeeper: Client environment:os.name=Linux
2015-05-27 23:27:56,262 INFO  [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2015-05-27 23:27:56,262 INFO  [main] zookeeper.ZooKeeper: Client environment:os.version=2.6.32-431.el6.x86_64
2015-05-27 23:27:56,262 INFO  [main] zookeeper.ZooKeeper: Client environment:user.name=root
2015-05-27 23:27:56,262 INFO  [main] zookeeper.ZooKeeper: Client environment:user.home=/root
2015-05-27 23:27:56,262 INFO  [main] zookeeper.ZooKeeper: Client environment:user.dir=/usr/local/hbase-0.99.2
2015-05-27 23:27:56,264 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=192.168.1.200:2181 sessionTimeout=90000 watcher=hconnection-0x5629409d, quorum=192.168.1.200:2181, baseZNode=/hbase
2015-05-27 23:27:56,542 INFO  [main-SendThread(192.168.1.200:2181)] zookeeper.ClientCnxn: Opening socket connection to server 192.168.1.200/192.168.1.200:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-27 23:27:56,544 INFO  [main-SendThread(192.168.1.200:2181)] zookeeper.ClientCnxn: Socket connection established to 192.168.1.200/192.168.1.200:2181, initiating session
2015-05-27 23:27:56,989 INFO  [main-SendThread(192.168.1.200:2181)] zookeeper.ClientCnxn: Session establishment complete on server 192.168.1.200/192.168.1.200:2181, sessionid = 0x14d95f3ad100012, negotiated timeout = 90000
2015-05-27 23:27:57,415 INFO  [main] util.RegionSizeCalculator: Calculating region sizes for table "waln_log".
2015-05-27 23:28:13,737 INFO  [main] mapreduce.JobSubmitter: number of splits:1
2015-05-27 23:28:13,920 INFO  [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2015-05-27 23:28:15,913 INFO  [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1432735450462_0004
2015-05-27 23:28:24,928 INFO  [main] impl.YarnClientImpl: Submitted application application_1432735450462_0004
2015-05-27 23:28:25,448 INFO  [main] mapreduce.Job: The url to track the job: http://baozi:8088/proxy/application_1432735450462_0004/
2015-05-27 23:28:25,449 INFO  [main] mapreduce.Job: Running job: job_1432735450462_0004
2015-05-27 23:31:26,790 INFO  [main] mapreduce.Job: Job job_1432735450462_0004 running in uber mode : false
2015-05-27 23:31:32,444 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2015-05-27 23:34:02,361 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2015-05-27 23:34:53,248 INFO  [main] mapreduce.Job: Job job_1432735450462_0004 completed successfully
2015-05-27 23:35:22,330 INFO  [main] mapreduce.Job: Counters: 41
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=133771
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=65
                HDFS: Number of bytes written=289
                HDFS: Number of read operations=4
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=169690
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=169690
                Total vcore-seconds taken by all map tasks=169690
                Total megabyte-seconds taken by all map tasks=173762560
        Map-Reduce Framework
                Map input records=2
                Map output records=2
                Input split bytes=65
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=2879
                CPU time spent (ms)=3350
                Physical memory (bytes) snapshot=85295104
                Virtual memory (bytes) snapshot=848232448
                Total committed heap usage (bytes)=15859712
        HBase Counters
                BYTES_IN_REMOTE_RESULTS=0
                BYTES_IN_RESULTS=157
                MILLIS_BETWEEN_NEXTS=13240
                NOT_SERVING_REGION_EXCEPTION=0
                NUM_SCANNER_RESTARTS=0
                NUM_SCAN_RESULTS_STALE=0
                REGIONS_SCANNED=1
                REMOTE_RPC_CALLS=0
                REMOTE_RPC_RETRIES=0
                RPC_CALLS=3
                RPC_RETRIES=0
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=289
2015-05-27 23:35:25,049 INFO  [main] ipc.Client: Retrying connect to server: baozi/192.168.1.200:36055. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2015-05-27 23:35:26,181 INFO  [main] ipc.Client: Retrying connect to server: baozi/192.168.1.200:36055. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2015-05-27 23:35:27,182 INFO  [main] ipc.Client: Retrying connect to server: baozi/192.168.1.200:36055. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
2015-05-27 23:35:30,592 INFO  [main] mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server














3.3、为导入数据创建一个表:
hbase(main):011:0> create 'waln_log1','cf'
0 row(s) in 30.2070 seconds


=> Hbase::Table - waln_log1
hbase(main):012:0> scan 'waln_log'
ROW                                COLUMN+CELL
 row1                              column=cf:age, timestamp=1432740300560, value=22
 row1                              column=cf:city, timestamp=1432740308281, value=shanghai
 row1                              column=cf:name, timestamp=1432740263412, value=zhangsan
 row2                              column=cf:name, timestamp=1432740296373, value=lisi
2 row(s) in 1.3960 seconds


hbase(main):013:0> scan 'waln_log1'
ROW                                COLUMN+CELL
0 row(s) in 0.2750 seconds


hbase(main):014:0> desc 'waln_log1'
Table waln_log1 is ENABLED
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE',
MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'
}
1 row(s) in 0.1500 seconds


hbase(main):015:0>


3.4、执行导入命令:


[root@baozi hbase-0.99.2]# bin/hbase org.apache.hadoop.hbase.mapreduce.Driver import waln_log1 hdfs://192.168.1.200:9000/usr/local/waln_log/part-m-00000





会跑MR,部分过程:
2015-05-27 23:46:27,426 INFO  [main] mapreduce.TableOutputFormat: Created table instance for waln_log1
2015-05-27 23:46:43,436 INFO  [main] input.FileInputFormat: Total input paths to process : 1
2015-05-27 23:46:45,387 INFO  [main] mapreduce.JobSubmitter: number of splits:1
2015-05-27 23:46:47,327 INFO  [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1432735450462_0005
2015-05-27 23:46:52,916 INFO  [main] impl.YarnClientImpl: Submitted application application_1432735450462_0005
2015-05-27 23:46:53,233 INFO  [main] mapreduce.Job: The url to track the job: http://baozi:8088/proxy/application_1432735450462_0005/
2015-05-27 23:46:53,251 INFO  [main] mapreduce.Job: Running job: job_1432735450462_0005
2015-05-27 23:48:52,937 INFO  [main] mapreduce.Job: Job job_1432735450462_0005 running in uber mode : false
2015-05-27 23:48:54,258 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2015-05-27 23:51:32,098 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2015-05-27 23:52:50,001 INFO  [main] mapreduce.Job: Job job_1432735450462_0005 completed successfully
2015-05-27 23:52:54,965 INFO  [main] mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=133322
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=411
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=3
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=0
        Job Counters
                Launched map tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=217016
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=217016
                Total vcore-seconds taken by all map tasks=217016
                Total megabyte-seconds taken by all map tasks=222224384
        Map-Reduce Framework
                Map input records=2
                Map output records=2
                Input split bytes=122
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=2050
                CPU time spent (ms)=2140
                Physical memory (bytes) snapshot=80756736
                Virtual memory (bytes) snapshot=845209600
                Total committed heap usage (bytes)=15859712
        File Input Format Counters
                Bytes Read=289
        File Output Format Counters
                Bytes Written=0






数据导入成功:
hbase(main):015:0> scan 'waln_log1'
ROW                                COLUMN+CELL
 row1                              column=cf:age, timestamp=1432740300560, value=22
 row1                              column=cf:city, timestamp=1432740308281, value=shanghai
 row1                              column=cf:name, timestamp=1432740263412, value=zhangsan
 row2                              column=cf:name, timestamp=1432740296373, value=lisi
2 row(s) in 2.5040 seconds


hbase(main):016:0>







3.5、创建的表结构与导出数据的表结构不同,会报错:
hbase(main):017:0> create 'waln_log2','cf1','cf2'
0 row(s) in 25.7630 seconds


=> Hbase::Table - waln_log2
hbase(main):018:0> desc 'waln_log2'
Table waln_log2 is ENABLED
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE',
 MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true
'}
{NAME => 'cf2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE',
 MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true
'}
2 row(s) in 0.2650 seconds


hbase(main):019:0> scan 'waln_log2'
ROW                                COLUMN+CELL
0 row(s) in 0.1870 seconds


hbase(main):020:0>


报错信息:
2015-05-28 00:02:31,224 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2015-05-28 00:08:20,162 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2015-05-28 00:08:55,603 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2015-05-28 00:09:30,247 INFO  [main] mapreduce.Job: Task Id : attempt_1432735450462_0006_m_000000_0, Status : FAILED

Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 actions: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family cf does not exist in region waln_log2,,1432742282179.6cd6a2a4d5ae585bd425ffbce92783c4. in table 'waln_log2', {NAME => 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'cf2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => 'FOREVER', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}



0 0
原创粉丝点击