phoenix secondary

来源:互联网 发布:axure7.0mac中文版下载 编辑:程序博客网 时间:2024/06/06 02:41

Secondary Indexing

With secondary indexing, the columns or expressions you index form an alternate row key to allow point lookups and range scans along this new axis

Covered Indexes

Phoenix is particularly powerful in that we provide covered indexes - we do not need to go back to the primary table once we have found the index entry. Instead, we bundle the data we care about right in the index rows, saving read-time overhead.


For example, the following would create an index on the v1 andv2 columns and include thev3 column in the index as well to prevent having to get it from the data table:

CREATE INDEX my_index ON my_table (v1,v2) INCLUDE(v3)

Functional Indexes

Functional indexes (available in 4.3 and above) allow you to create an index not just on columns, but on an arbitrary expressions


For example, the following would create this functional index:

CREATE INDEX UPPER_NAME_IDX ON EMP (UPPER(FIRST_NAME||' '||LAST_NAME))

Global Indexes

Global indexing targets read heavy uses cases. With global indexes, all the performance penalties for indexes occur at write time. We intercept the data table updates on write (DELETE,UPSERT VALUES and UPSERT SELECT), build the index update and then sent any necessary updates to all interested index tables. At read time, Phoenix will select the index table to use that will produce the fastest query time and directly scan it just like any other HBase table. By default, unless hinted, an index will not be used for a query that references a column that isn’t part of the index


Local Indexes

Local indexing targets write heavy, space constrained use cases.With local indexes, index data and table data co-reside on same server preventing any network overhead during writes.Local indexes can be used even when the query isn’t fully covered .Phoenix automatically retrieve the columns not in the index through point gets against the data table.  we are storing all local index data in the separate shadow column families in the same data table. At read time when the local index is used, every region must be examined for the data as the exact region location of index data cannot be predetermined. Thus some overhead occurs at read-time


Index Population 入口

By default, when an index is created, it is populated synchronously during the CREATE INDEX call. This may not be feasible depending on the current size of the data table. As of 4.5, initially population of an index may be done asynchronously by including the ASYNC keyword in the index creation DDL statement:

CREATE INDEX async_index ON my_schema.my_table (v) ASYNC


The map reduce job that populates the index table must be kicked off separately through the HBase command line like this:

${HBASE_HOME}/bin/hbase org.apache.phoenix.mapreduce.index.IndexTool

  --schema MY_SCHEMA --data-table MY_TABLE --index-table ASYNC_IDX

  --output-path ASYNC_IDX_HFILES

Index Usage

There are three means of getting an index to be used in this case

  1. Create a covered index by including v2 in the index:
    CREATE INDEX my_index ON my_table (v1) INCLUDE (v2)


  2. This will cause the v2 column value to be copied into the index and kept in synch as it changes. This will obviously increase the size of the index.
  3. Hint the query to force it to use the index:
    SELECT /*+ INDEX(my_table my_index) */ v2 FROM my_table WHERE v1 = 'foo'


  4. This will cause each data row to be retrieved when the index is traversed to find the missing v2 column value. This hint should only be used if you know that the index has good selective (i.e. a small number of table rows have a value of ‘foo’ in this example), as otherwise you’ll get better performance by the default behavior of doing a full table scan.
  5. Create alocal index:
    CREATE LOCAL INDEX my_index ON my_table (v1)

Index Removal

To drop an index, you’d issue the following statement:

DROP INDEX my_index ON my_table

If an indexed column is dropped in the data table, the index will automatically be dropped. In addition, if a covered column is dropped in the data table, it will be automatically dropped from the index as well.

Index Properties

Just like with the CREATE TABLE statement, the CREATE INDEX statement may pass through properties to apply to the underlying HBase table, including the ability to salt it:

CREATE INDEX my_index ON my_table (v2 DESC, v1) INCLUDE (v3)    SALT_BUCKETS=10, DATA_BLOCK_ENCODING='NONE'

Note that if the primary table is salted, then the index is automatically salted in the same way for global indexes. In addition, the MAX_FILESIZE for the index is adjusted down, relative to the size of the primary versus index table. For more on salting see here. With local indexes, on the other hand, specifying SALT_BUCKETS is not allowed.

Mutable Tables

For non transactional mutable tables, we maintain index update durability by adding the index updates to the Write-Ahead-Log (WAL) entry of the primary table row.indexes on non transactional mutable tables are only ever a single batch of edits behind the primary table

It’s important to note several points:

  • For non transactional tables, you could see the index table out of sync with the primary table.
  • As noted above, this is ok as we are only a very small bit behind and out of sync for very short periods
  • Each data row and its index row(s) are guaranteed to to be written or lost - we never see partial updates as this is part of the atomicity guarantees of HBase.
  • Data is first written to the table followed by the index tables (the reverse is true if the WAL is disabled).

Singular Write Path

Disallow table writes until mutable index is consistent

Disable mutable indexes on write failure until consistency restored



原创粉丝点击