大数据技术-HBase:使用CopyTable在线备份HBase表数据

来源:互联网 发布:淘宝网旺旺在哪里 编辑:程序博客网 时间:2024/04/28 02:20

CopyTable是hbase提供的一个很有用的备份工具。主要可以用于集群内部表备份,远程集群备份,表数据增量备份,部分结构数据部分等。其依赖于hadoop mapreduce,使用标准的hbase scan读接口和put写接口。

使用之前,请务必先在集群中创建好需要写入的目标表tableDst,不然会报错,同时注意对于在备份期间新写入的数据无法保证都进行复制到目标表中。


# create new tableOrig on destination cluster
dstCluster$echo"create 'tableOrig', 'cf1', 'cf2'"|hbaseshell
# on source cluster run copy table with destination ZK quorum specified using --peer.adr
# WARNING: In older versions, you are not alerted about any typo in these arguments!
srcCluster$hbaseorg.apache.hadoop.hbase.mapreduce.CopyTable--peer.adr=dstClusterZK:2181:/hbasetableOrig


# create new tableCopy on destination cluster
dstCluster$echo"create 'tableCopy', 'cf1', 'cf2'"|hbaseshell
# on source cluster run copy table with destination --peer.adr and --new.name arguments.
srcCluster$hbaseorg.apache.hadoop.hbase.mapreduce.CopyTable--peer.adr=dstClusterZK:2181:/hbase--new.name=tableCopytableOrig


# WARNING: In older versions, you are not alerted about any typo in these arguments!
# copy from beginning of time until timeEnd 
# NOTE: Must include start time for end time to be respected. start time cannot be 0.
srcCluster$hbaseorg.apache.hadoop.hbase.mapreduce.CopyTable...--starttime=1--endtime=timeEnd...
# Copy from starting from and including timeStart until the end of time.
srcCluster$hbaseorg.apache.hadoop.hbase.mapreduce.CopyTable...--starttime=timeStart...
# Copy entries rows with start time1 including time1 and ending at timeStart excluding timeEnd.
srcCluster$hbaseorg.apache.hadoop.hbase.mapreduce.CopyTable...--starttime=timestart--endtime=timeEnd



Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>


Options:
 rs.class     hbase.regionserver.class of the peer cluster
              specify if different from current cluster
 rs.impl      hbase.regionserver.impl of the peer cluster
 startrow     the start row
 stoprow      the stop row
 starttime    beginning of the time range (unixtime in millis)
              without endtime means from starttime to forever
 endtime      end of the time range.  Ignored if no starttime specified.
 versions     number of cell versions to copy
 new.name     new table's name
 peer.adr     Address of the peer cluster given in the format
              hbase.zookeeer.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent
 families     comma-separated list of families to copy
              To copy from cf1 to cf2, give sourceCfName:destCfName. 
              To keep the same name, just give "cfName"
 all.cells    also copy delete markers and deleted cells


Args:
 tablename    Name of the table to copy


Examples:
 To copy 'TestTable' to a cluster that uses replication for a 1 hour window:
 $ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr=server1,server2,server3:2181:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable 
For performance consider the following general options:
-Dhbase.client.scanner.caching=100
-Dmapred.map.tasks.speculative.execution=false


0 0
原创粉丝点击