HawtDB introduction

来源：互联网发布：下载听故事软件编辑：程序博客网时间：2024/06/05 03:01

http://hawtdb.fusesource.org/

http://repo.fusesource.com/nexus/content/groups/public/org/fusesource/

HawtDB is a high-performance key/value embedded database for Java applications.

Table of contents
- HawtDB
- - - Features
  - Overview
  - - Transaction Page Files
    - Non Transaction Page Files
  - Using Indexes with Page Files
  - - Accessing Index Data
  - Optimistic Transactions
  - Customized Key/Value Encoding
  - Custom Comparators
  - Iterating Sorted Indexes
  - Object Caching and Object Encoding Deferring
  - Support for Multiple Indexes in one Page File
  - References

Features

Optional ACID transactions
Multiversion Concurrency Control (MVCC)
Object Caching and Object Marshaling Deferring
BTree and Hash Indexes

Overview

HawtDB implements index operations over a page file. An index provides Map like access to persistent data. The page file provides block oriented access to a data file. There are transactional and non-transactional implementations of the page file. The transactional version provides MVCC.

Transaction Page Files

A transactional page file is created with the TxPageFileFactory class. You first configure it and then open() it.

Example:

// Opening a transactional page file.TxPageFileFactory factory = new TxPageFileFactory()factory.setFile(new File("mydb.dat"));factory.open();TxPageFile pageFile = factory.getTxPageFile();

Use a transactional page file when you need your indexes to have ACID levels of service.

Non Transaction Page Files

Sometimes you don't need an index to provide ACID levels of service. Perhaps your just using the index to page out of memory temporary values that can be discarded on restart. Then you just need to create a plainPageFile instead of a TxPageFile

Similar to transaction page files, a non-transactional page file is created with thePageFileFactory class. You first configure it and then open() it.

Example:

// Opening a transactional page file.PageFileFactory factory = new PageFileFactory()factory.setFile(new File("mydb.dat"));factory.open();PageFile pageFile = factory.getPageFile();

Since the page file is not transactional, you must serialize access to the index and page file yourself. Concurrent index updates will just step on each other and cause unexpected errors and index inconsistencies. Furthermore, Non-Transactional page files do keep a persistent free page list. When it is reopened it will think all the pages can be used for new page updates.

Using Indexes with Page Files

There are two kinds of indexes. BTree and Hash based indexes.

BTree: Maintain keys in sorted order and support sorted iteration
Hash: Keys are not sorted. Can perform better than BTree indexes.

Both the BTree and Hash indexes use factory object to create instances of the index. The factory holds configuration information for the index. The factories support either creating a new index or opening a previously created index.

Index factories can be mixed and matched with the page files implementations. So yes hash indexes can be used with either transaction or non transaction page files.

When an index is opened or created, you must pass it either PageFile or a Transaction.

Hash Index Non-Transactional Example:

HashIndexFactory<String, String> indexFactory =     new HashIndexFactory<String, String>();Index<String, String> root = indexFactory.create(pageFile);

BTree Index Transactional Example:

BTreeIndexFactory<String, String> indexFactory =     new BTreeIndexFactory<String, String>();Transaction tx = pageFile.tx();Index<String, String> root = indexFactory.create(tx);tx.commit();

HawtDB also provides a method to open or create an index, so you don't have to worry if the index already exists or not.

For Example:

BTreeIndexFactory<String, String> indexFactory =     new BTreeIndexFactory<String, String>();Index<String, String> root = indexFactory.openOrCreate(tx);    tx.commit();}

This method actually is the preferred way to access an index.

Accessing Index Data

Storing data in an index:

Transaction tx = pageFile.tx();Index<String, String> root = indexFactory.open(tx);root.put("Key:1", "Hello World");tx.commit();

Retreiving data from an index:

Transaction tx = pageFile.tx();Index<String, String> root = indexFactory.open(tx);String result = root.get("Key:1");tx.commit();

Optimistic Transactions

HawtDB does not provide any form of pessimistic locking. All transactions execute concurrently potentially at different versioned views of the page file. Once a transaction starts, it will not see the updates performed to the page file by subsequent transactions that commit.

If any of the pages updated in a transaction are updated by another concurrent update, then aOptimisticUpdateException is thrown and and the update is rolled back. Therefore, end user operations which update indexes should catch this runtime exception and either retry the operation or report the failure to the user.

Customized Key/Value Encoding

By default the indexes will use standard java object serialization to serialize keys and values. If you want to provide a customized encoding strategy or if your key or value does not support object serialization, then you can configure key and value codecs on the index factory objects.

For example, it is about 30 times more efficient to use the StringCodec StringCodec when using String values than to standard object serialization:

indexFactory.setKeyCodec(StringCodec.INSTANCE);indexFactory.setValueCodec(StringCodec.INSTANCE);

You can create custom Codec implementations by implementing the Codec class.

Custom Comparators

If your keys do not implement the Comparable interface, or if you want to sort the key different from their natural sort order, then you can configure a Comparator instance on the index factory. For example, to do a case insensitive sorting of the string keys you would:

indexFactory.setComparator(new Comparator<String>(){    public int compare(String o1, String o2) {        return o1.compareToIgnoreCase(o2);    }};);

Iterating Sorted Indexes

Sorted indexes can be sequentially iterated using the enhanced for loop since they implement the Iterable interface. For example:

for(Map.Entry<Key, Value> entry: index) {  System.out.println(entry.getKey()+" = "+entry.getValue());}

Typically that's not used since folks are usually only interested in a subset of the keys. It is more efficient if a predicate based iterator is used. Predicates help cut down the number of BTree nodes that need to be accessed. Example:

import org.fusesource.hawtdb.api.Predicates.*;...Predicate<Long> p = and(gt(new Long(12)), lt(new Long(450))Iterator<Map.Entry<Key, Value>> i = index.iterator(p);while( i.hasNext() ) {  Map.Entry<Key, Value> entry = i.next();  System.out.println(entry.getKey()+" = "+entry.getValue());}

The example above is using the static methods in the Predicates class to help build a complex predicate expression. You are free to implement the Predicate interface to create more complex predicates.

Object Caching and Object Encoding Deferring

If your keys and values are immutable values, then you should enable the deferred encoding option on the index. This option allows the index to avoid encoding the keys and values when the index operations occur and instead places them on to an internally maintained object cache. Keys/Values are not encoded until the last possible moment, which is when the index pages are flushed to disk in a batch. This allows you to avoid some encoding processing if the same key gets updated multiple times. It also avoid some object decoding as a subsequent get operation can return a cached value instead of having to do a page decode.

Example:

indexFactory.setDeferredEncoding(true);

Support for Multiple Indexes in one Page File

HawtDB supports storing multiple (correlated) indexes in a single page file.
This becomes desirable if you:

want to atomically update multiple indexes in single transaction
have a large number of indexes and want to avoid using a file for each index

You can do that by using the special purpose MultiIndexFactory, which works by identifying each index by name, and using such a name to refer to the index for later reopening. For example:

MultiIndexFactory multiIndexFactory = new MultiIndexFactory(tx);IndexFactory<String, String> indexFactory1 = new BTreeIndexFactory<String, String>();IndexFactory<String, String> indexFactory2 = new HashIndexFactory<String, String>();Index<String, String> index1 = multiIndexFactory.create("1", indexFactory1);Index<String, String> index2 = multiIndexFactory.create("2", indexFactory2);

To open those indexes again, just provide their name and index factory:

MultiIndexFactory multiIndexFactory = new MultiIndexFactory(tx);IndexFactory<String, String> indexFactory1 = new BTreeIndexFactory<String, String>();IndexFactory<String, String> indexFactory2 = new HashIndexFactory<String, String>();Index<String, String> index1 = multiIndexFactory.open("1", indexFactory1);Index<String, String> index2 = multiIndexFactory.open("2", indexFactory2);

MultiIndexFactory also provides an openOrCreate method analogous to the one previously described.

References

* Full API