【Data Structures】 11. HashTable—Simple Implementation

来源:互联网 发布:linux注销当前用户 编辑:程序博客网 时间:2024/05/20 17:24

HashTableInterface interface

// HashTable interface that takes only positive integers.// No mapping, just keyspublic interface HashTableInterface {    // Return true when the key is found.    boolean search(int key);    // Delete and return an int key from the table.    int delete(int key);    // Insert an int key to the table.    void insert(int key);}

HashTable class

public class HashTable implements HashTableInterface {    private static final DataItem DELETED = new DataItem(-1);    private DataItem[] hashArray;    // precondition: initialCapacity is a positive int    public HashTable(int initialCapacity) {        hashArray = new DataItem[initialCapacity];    }    // static nested class    private static class DataItem {        private int key;        DataItem(int k) {            key = k;        }}

First things first: Hashing method

// private helper method for hashing a key valueprivate int hashFunc(key) {    return key % hashArray.length;}

Searching for a key

@Overridepublic boolean search(int key) {    int hashVal = hashFunc(key);    while (hashArray[hashVal] != null) {        if (hashArray[hashVal].key == key) {            return true;  // found        }        hashVal++;        // wrap around        hashVal = hashVal % hashArray.length;    }    return false;  // cannot find}

Deleting a key

@Overridepublic int delete(int key) {    int hashVal = hashFunc(key);    while (hashArray[hashVal] != null) {        if (hashArray[hashVal].key == key) {            int temp = hashArray[hashVal].key;            hashArray[hashVal] = DELETED;            return temp;        }        hashVal++;        hashVal = hashVal % hashArray.length;    }    return -1;}

Inserting a key

@Overridepublic void insert(int key) {    DataItem item = new DataItem(key);    int hashVal = hashFunc(key);    while (hashArray[hashVal] != null && hashArray[hashVal] != DELETED) {        hashVal++;        // wrap around        hashVal = hashVal % hashArray.length;    }    hashArray[hashVal] = item;}

What makes a good hash function?

Quick computation is the key to a good hash function. Thus, a hash function with many multiplications and divisions is NOT a good idea.

The propose of a hash function is to take a range of key values and transform them into index values in a way that the key values are distributed randomly across all the indices of the hash table.

Key values may be completely random or not so random.

1. Random key values

If the key values are random and positive, then we can simply find index values by the following simple operation just like our code before.

index = key % hashArray.length;

2. Non-random key values

For example, there is a database that uses car-part numbers as key values.

033-400-03-94-05-0-535

This is interpreted as follow:

Digits 0-2: Supplier number (1 to 999, currently up to 70)

Digits 3-5: Category code (100, 150, 200, 250, up to 850)

Digits 6-7: Month of introduction (1 to 12)

Digits 8-9: Year of introduction (00 to 99)

Digits 10-11: Serial number (1 to 99, never exceeds 100)

Digits 12: Toxic risk flag (0 or 1)

Digits 13-15: Checksum (sum of the other fields)

Based on the interpretations provided, the key value should be 0,334,000,394,050,535 for the particular part number shown above.

However, we can say that there is no guarantee that we will have random numbers between 0 to 9,999,999,999,999,999.

Some work should be done to have these part numbers to form a range of more random numbers.

1. Don't Use Non-Data: The key values should be squeezed as much as it could. For example, category code has to be changed to be from 0 to 15. Also, the checksum should be removed because it is derived number from other information and does not add any new information.

2. Use All the Data: Other than the non-data values, we need to use all of the data values. Don't just use the first four digits, etc.

3. Use a Prime Number for the Modulo Base: Which means the table length should be a prime number. For example, if the table array length is 50, then all of the multiples of 50 in our car-part numbers will be hashed into the same index.

4. Use Folding: Another reasonable hash function involves breaking keys into groups of digits and adding the groups.

SSN example: 123-45-6789

In case table length is 1009: Break the number into three groups of three digits. (123+456+789 = 1368 % 1009 = 359)

In case table length is 101: Break the number into four two-digit numbers and one one-digit number. (12+34+56+78+9 = 189 % 101 = 88).

This way you can distribute the numbers better.

The basic idea is to examine your key values carefully and implement your hash function to remove any irregularity in the distribution of the key values.

0 0