Phoenix优化

来源:互联网 发布:网络社交的利与弊论点 编辑:程序博客网 时间:2024/05/23 14:29
hbase-site.xml:
<property>
<name>hbase.master.maxclockskew</name>
<value>45000000</value>
</property>
<property>   
<name>hbase.rpc.timeout</name>
<value>36000000</value>
   </property>
<property>   
<name>hbase.client.scanner.timeout.period</name>
<value>36000000</value>
   </property>
<!--控制超时的属性-->
<property>   
<name>mapreduce.task.timeout</name>
<value>1200000</value>
   </property>
<property>   
<name>zookeeper.session.timeout</name>
<value>1200000</value>
   </property>
<!--写缓存大小-->
<property>
<name>hbase.client.write.buffer</name>
<value>20971520</value>
</property>
<!--hbase.balancer.period-->
<property>
<name>hbase.balancer.period</name>
<value>300000</value>
</property>
<!--二级索引-->
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>
组合主键:
create table "test_keys2" ( "V_1" decimal(24,8), "V_2" varchar, "YEAR" INTEGER not null , "PERIOD" INTEGER not null ,"ACCOUNT" INTEGER not null , "ENTITY" INTEGER not null , "SCENARIO" INTEGER not null , "CURRENCY" INTEGER not null , "VERSION" INTEGER not null , "CST_DIM_02217" INTEGER not null, "CST_DIM_30453" INTEGER not null, "CST_DIM_47894" INTEGER not null , "CST_DIM_61310" INTEGER  not null , "CST_DIM_81981" INTEGER not null , "CST_DIM_01216" INTEGER not null, "CST_DIM_25287" INTEGER  not null, "CST_DIM_41183" INTEGER not null constraint pk primary key("YEAR" ,  "PERIOD" , "ACCOUNT" ,"ENTITY" , "SCENARIO" , "CURRENCY" , "VERSION","CST_DIM_02217" ,"CST_DIM_30453" ,"CST_DIM_47894" , "CST_DIM_61310","CST_DIM_81981" ,"CST_DIM_01216" , "CST_DIM_25287" ,"CST_DIM_41183"));

upsert插入数据有问题
二级索引:
同步创建索引
CREATE INDEX ifact1 ON C1_FACT("diminfo".YEAR) INCLUDE("diminfo"."V_1","diminfo".V_2 ,  "diminfo".PERIOD ,  "diminfo".ACCOUNT , "diminfo".ENTITY , "diminfo".SCENARIO , "diminfo".CURRENCY , "diminfo".VERSION ,"diminfo"."CST_DIM_02217" , "diminfo"."CST_DIM_30453" , "diminfo"."CST_DIM_47894" , "diminfo"."CST_DIM_61310", "diminfo"."CST_DIM_81981" , "diminfo"."CST_DIM_01216" , "diminfo"."CST_DIM_25287" , "diminfo"."CST_DIM_41183")
当执行create index的时候,索引表会直接与源数据表进行同步。但是,有时候我们的源表数据量很大,同步创建索引会抛出异常。

异步创建索引
create index ifact1 on C1_FACT ("diminfo".YEAR) include("diminfo"."V_1","diminfo".V_2 ,  "diminfo".PERIOD ,  "diminfo".ACCOUNT , "diminfo".ENTITY , "diminfo".SCENARIO , "diminfo".CURRENCY , "diminfo".VERSION ,"diminfo"."CST_DIM_02217" , "diminfo"."CST_DIM_30453" , "diminfo"."CST_DIM_47894" , "diminfo"."CST_DIM_61310", "diminfo"."CST_DIM_81981" , "diminfo"."CST_DIM_01216" , "diminfo"."CST_DIM_25287" , "diminfo"."CST_DIM_41183") ASYNC
通过create index的时候指定 ASYNC 关键字来指定异步创建索引。执行这个命令之后并不会引起索引表与源表的直接同步。这个时候查询并不会使用这个索引表。那么索引数据的导入还需要采用phoenix提供的索引同步工具类 IndexTool , 这是一个mapreduce工具类,使用方式如下:
$HBASE_HOME/bin/hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table C1_FACT --index-table ifact10   --output-path  hdfs:/phoenixindex
数据量大,报异常
加上--direct  如果指定,避免批量加载(可选)
原创粉丝点击