Setup phoenix index

来源:互联网 发布:ubuntu改中文 编辑:程序博客网 时间:2024/06/18 11:08

phoenix index Setup

Non transactional, mutable indexing requires special configuration options on the region server and master to run


You will need to add the following parameters to hbase-site.xml on each region server:

<property>

  <name>hbase.regionserver.wal.codec</name>

  <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>

</property>

The above property enables custom WAL edits to be written, ensuring proper writing/replay of the index updates. This codec supports the usual host of WALEdit options, most notably WALEdit compression.



<property>

  <name>hbase.region.server.rpc.scheduler.factory.class</name>

  <value>org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory</value>

  <description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description>

</property>

<property>

  <name>hbase.rpc.controllerfactory.class</name>

  <value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value>

  <description>Factory to create the Phoenix RPC Scheduler that uses separate queues for index and metadata updates</description>

</property>

The above properties prevent deadlocks from occurring during index maintenance for global indexes (HBase 0.98.4+ and Phoenix 4.3.1+ only) by ensuring index updates are processed with a higher priority than data updates. It also prevents deadlocks by ensuring metadata rpc calls are processed with a higher priority than data rpc calls.


From Phoenix 4.8.0 onward, no configuration changes are required to use local indexing. In Phoenix 4.7 and below, the following configuration changes are required to the server-side hbase-site.xml on the master and regions server nodes:

<property>

  <name>hbase.master.loadbalancer.class</name>

  <value>org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer</value>

</property>

<property>

  <name>hbase.coprocessor.master.classes</name>

  <value>org.apache.phoenix.hbase.index.master.IndexMasterObserver</value>

</property>

<property>

  <name>hbase.coprocessor.regionserver.classes</name>

  <value>org.apache.hadoop.hbase.regionserver.LocalIndexMerger</value>

</property>

注:4.8.0之后不需要配置,4.7.0及之前需要配置


Upgrading Local Indexes created before 4.8.0

While upgrading the Phoenix to 4.8.0+ version at server remove above three local indexing related configurations fromhbase-site.xml if present. From client we are supporting both online(while initializing the connection from phoenix client of 4.8.0+ versions) and offline(using psql tool) upgrade of local indexes created before 4.8.0. As part of upgrade we recreate the local indexes in ASYNC mode. After upgrade user need to build the indexes usingIndexTool

Following client side configuration used in the upgrade.

  1. phoenix.client.localIndexUpgrade
    • The value of it is true means online upgrade and false means offline upgrade.
    • Default: true

Command to run offline upgrade using psql tool $ psql [zookeeper] -l

Tuning

Out the box, indexing is pretty fast. However, to optimize for your particular environment and workload, there are several properties you can tune

All the following parameters must be set in hbase-site.xml - they are true for the entire cluster and all index tables, as well as across all regions on the same server (so, for instance, a single server would not write to too many different index tables at once).

  1. index.builder.threads.max
    • Number of threads to used to build the index update from the primary table update
    • Increasing this value overcomes the bottleneck of reading the current row state from the underlying HRegion. Tuning this value too high will just bottleneck at the HRegion as it will not be able to handle too many concurrent scan requests as well as general thread-swapping concerns.
    • Default: 10
  2. index.builder.threads.keepalivetime
    • Amount of time in seconds after we expire threads in the builder thread pool.
    • Unused threads are immediately released after this amount of time and not core threads are retained (though this last is a small concern as tables are expected to sustain a fairly constant write load), but simultaneously allows us to drop threads if we are not seeing the expected load.
    • Default: 60
  3. index.writer.threads.max
    • Number of threads to use when writing to the target index tables.
    • The first level of parallelization, on a per-table basis - it should roughly correspond to the number of index tables
    • Default: 10
  4. index.writer.threads.keepalivetime
    • Amount of time in seconds after we expire threads in the writer thread pool.
    • Unused threads are immediately released after this amount of time and not core threads are retained (though this last is a small concern as tables are expected to sustain a fairly constant write load), but simultaneously allows us to drop threads if we are not seeing the expected load.
    • Default: 60
  5. hbase.htable.threads.max
    • Number of threads each index HTable can use for writes.
    • Increasing this allows more concurrent index updates (for instance across batches), leading to high overall throughput.
    • Default: 2,147,483,647
  6. hbase.htable.threads.keepalivetime
    • Amount of time in seconds after we expire threads in the HTable’s thread pool.
    • Using the “direct handoff” approach, new threads will only be created if it is necessary and will grow unbounded. This could be bad but HTables only create as many Runnables as there are region servers; therefore, it also scales when new region servers are added.
    • Default: 60
  7. index.tablefactory.cache.size
    • Number of index HTables we should keep in cache.
    • Increasing this number ensures that we do not need to recreate an HTable for each attempt to write to an index table. Conversely, you could see memory pressure if this value is set too high.
    • Default: 10
  8. org.apache.phoenix.regionserver.index.priority.min
    • Value to specify to bottom (inclusive) of the range in which index priority may lie.
    • Default: 1000
  9. org.apache.phoenix.regionserver.index.priority.max
    • Value to specify to top (exclusive) of the range in which index priority may lie.
    • Higher priorites within the index min/max range do not means updates are processed sooner.
    • Default: 1050
  10. org.apache.phoenix.regionserver.index.handler.count
    • Number of threads to use when serving index write requests for global index maintenance.
    • Though the actual number of threads is dictated by the Max(number of call queues, handler count), where the number of call queues is determined by standard HBase configuration. To further tune the queues, you can adjust the standard rpc queue length parameters (currently, there are no special knobs for the index queues), specificallyipc.server.max.callqueue.length and ipc.server.callqueue.handler.factor. See theHBase Reference Guide for more details.
    • Default: 30


Index Scrutiny Tool


Limitations

  • If rows are actively being updated or deleted while the scrutiny is running, the tool may give you false positives for inconsistencies (PHOENIX-4277).
  • Snapshot reads are not supported by the scrutiny tool (PHOENIX-4270).