如何在hbase中快速插入10万条数据

来源：互联网发布：最优化方法孙文瑜pdf 编辑：程序博客网时间：2024/04/29 07:25

我们知道每一条put操作实际上都是一个rpc操作，它将客户端数据传送到服务器然后返回。这只是折小数据量的操作，如果有一个应用需要插入十万行数据

到hbase表中，这样处理就太不合适了。

hbase的api配备了一个客户端的些缓冲区，缓冲区负责手机put操作，然后调用rpc一次性将put送往服务器。

下面是一个插入十万行到一个表的代码：

import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.hbase.HBaseConfiguration;import org.apache.hadoop.hbase.TableName;import org.apache.hadoop.hbase.client.HBaseAdmin;import org.apache.hadoop.hbase.client.HTable;import org.apache.hadoop.hbase.client.Put;import org.apache.hadoop.hbase.util.Bytes;public class AddTest {static Configuration conf = null;      static {          conf = HBaseConfiguration.create();               }  public static void main(String args[]){String tableName = "testtable1";String familie1 = "colfam1";String familie2 = "colfam2";String[] column = {"col-5","col-33","k"};String[] values = {"wellcome","my house","yes"};try {              //检查制定的表是否存在                      HBaseAdmin admin=new HBaseAdmin(conf);              if(!admin.tableExists(Bytes.toBytes(tableName)))              {                  System.err.println("the table "+tableName+" is not exist");                  System.exit(1);              }              admin.close();            //创建表连接              HTable table=new HTable(conf,TableName.valueOf(tableName));              //将数据自动提交功能关闭              table.setAutoFlush(false);              //设置数据缓存区域              table.setWriteBufferSize(128*1024);               //然后开始写入数据            int i = 0;            while(i <100000){                Put put=new Put(Bytes.toBytes("row"+i));                  put.add(Bytes.toBytes(familie1),Bytes.toBytes(column[0]),Bytes.toBytes(values[0]));                 //put.add(Bytes.toBytes(familie2),Bytes.toBytes(column[1]),Bytes.toBytes(values[1]));                table.put(put);  i++;                System.out.println(i);            //刷新缓存区             }             table.flushCommits();            //关闭表连接              table.close();          } catch (Exception e) {              // TODO: handle exception              e.printStackTrace();          }        System.out.println("success");    }}

在我的虚拟机集群中测试证明只需要几秒就可以插入十万行数据，这比单独的put语句运行十万次快多了，另外缓冲区的大小设定也会效率。

0 0