hashtable HashMap相关问题

来源:互联网 发布:java是什么 编辑:程序博客网 时间:2024/06/05 02:12

1 Hash Function

(1) it always returns a number for an object

(2) two equal objects will always have the same number

(3) two unequal objects not always have different numbers


One way to implement in Java is hashCode(). The hashCode() method is implemented in the Oject class and therefore each class in Java inherits it. The hash code provides a numeric representation of an object.(somewhat similar to toString method that gives a text representation of an object)

hash function is used to convert a string(or any other type) into an integer smaller than hash size and bigger or equal to zero. The objective of designing a hash function is to "hash" the key as unreasonable as possible. A good hash function can avoid collision as less as possible. A widely used hash function algorithm is using a magic number 33, consider any string as a 33 based big integer like follow:

这里给出一种hash function的方法!!!!!利用ascii值和magic number


hashcode("abcd") = (ascii(a) * 33^3 + ascii(b) * 33^2 + ascii(c) *33 + ascii(d)) % HASH_SIZE

= (97* 33^3 + 98 * 33^2 + 99 * 33 +100) % HASH_SIZE

= 3595978 % HASH_SIZE


here HASH_SIZE is the capacity of the hash table (you can assume a hash table is like an array with index 0 ~ HASH_SIZE-1).

Given a string as a key and the size of hash table, return the hash value of this key.


Example
For key="abcd" and size=100, return 78

Code中要注意的是overflow 所以要mod hash_size!!!

public int hashCode(char[] key,int HASH_SIZE) {        int b = 33;        int res = 0;        for (int i = 0; i < key.length; ++i) {            res = multiply(res, b, HASH_SIZE);            res += key[i];            res %= HASH_SIZE;        }        return res;    }    public int multiply(long x,long y,int m) {        return (int) (x * y % m);    }


2 Collision & Rehash Question from lintcode

Collision : different objects( by equals() method) may have same hashcode.

2 way to solve collision:

(1) Separate chaining sollision resolution: use linked list ---- hashtable is an array of list 

(2) Linear probing:  if can not insert at index k, try next slot k + 1.if occupied, go to k + 2.


Rehash question from lintcode

主要介绍了rehash 以及hashtable存储方式。 Hashtable 实际上是一个单向链表linklistnode的数组!


The size of the hash table is not determinate at the very beginning. If the total size of keys is too large (e.g. size >= capacity / 10), we should double the size of the hash table and rehash every keys. Say you have a hash table looks like below:

size=3, capacity=4
[null, 21->9->null, 14->null, null]

The hash function is:

int hashcode(int key, int capacity) {
return key % capacity;
}

here we have three numbers, 9, 14 and 21, where 21 and 9 share the same position as they all have the same hashcode 1 (21 % 4 = 9 % 4 = 1). We store them in the hash table by linked list.rehashing this hash table, double the capacity, you will get:

size=3, capacity=8
index: 0 1 2 3 4 5 6 7
hash table: [null, 9, null, null, null, 21, 14, null]

Given the original hash table, return the new hash table after rehashing .
Note
For negative integer in hash table, the position can be calculated as follow:

In C++/Java, if you directly calculate -4 % 3 you will get -1. You can use function: a % b = (a % b + b) % b to make it is a non negative integer.

In Python, you can directly use -1 % 3, you will get 2 automatically.

Example
Given [null, 21->9->null, 14->null, null], return [null, 9->null, null, null, null, 21->null, 14->null, null]


思路很简单,code略。


hashtable is constant time performance for add, remove, contains, size. 

collision : worst all collide into same index,  need to search for one of them == search in list (linear time)

How to guarantee expected constant time?

make sure lists not become too long. Usually implemented by Load Factor.

 Load factor keeps track of average length of lists, if approaches a set in advanced threshold, create a bigger array and rehash all elements from old table into a new one.


0 0