【Data Structures】 10. Hashing—Mission Possible
来源:互联网 发布:linux编辑文件不保存 编辑:程序博客网 时间:2024/06/14 12:52
Converting words to numbers, more specifically integers.
Workaround 1: Open Addressing (mainly linear probing)
Step size:
In linearing probing, the step size is always 1 that means the probe goes to x, x+1, x+2, x+3 and so on.
Clustering:
A sequence of filled cells in a hash table that is long.
As a hash table becomes more and more full, the clusters grow larger and larger.
- When the hash table is half full, the performance is still not bad.
- However, it is proven that, when it is beyond two-thirds full, the performance degrades seriously.
It is critical to ensure that a hash table never becomes full (Ideally, more than two-thirds full).
Load factor
It is the ratio of the number of data items in a hash table to the length of the array.
In linear probing, search time becomes really slow as the load factor approaches 1.
How to solve it?
Rehashing
First, it is necessary to create a new array that is mostly twice bigger than the old array length but it depends on the load factor you provided.
The hash method then calculates the location of a given data item based on the new array length.
Second, we need to go through the old array, cell by cell, and insert them by calling hash function over and over.
It's a time-consuming process.
In Open Addressing, there are two other major collision resolution mechanisms: Quadratic Probing and Double Hashing.
Workaround 2: Separate Chaining
In open addressing, collisions are resolved by looking for an open cell in the hash table.
Another approach is to put a linked list at each index in the hash table.
In separate chaining, it is normal to put n or more items in an array of length n.
Finding the initial cell takes O(1) whereas searching through a linked list takes O(k) when there are k number of elements in the list.
Thus, we do not want the linked lists become too full either.
Especially, if your hash function is not good.
However, the load factor in separate chaining can rise above 1 without hurting performance too much assuming hashCode method is good.
When in doubt, you may consider to use separate chaining, especially, if the number of items that will be inserted into a hash table is unknown. In other words, separate chaining would be better when you would expect to have a high load factor.
A few ways to deal with collisions.
Linear Probing:
When there is a collision, we try to find an empty cell sequentially and put the value into the nearest empty cell. However, this approach has an issue of forming the primary clusters and the performance can get really bad. It is necessary to rehash to keep the load factor from time to time.
There is also Quadratic Probing that has another subtle clustering issue called secondary clustering due to the fixed interval of probing.
To solve this clustering issue, there is another workaround, called Double Hashing, which uses two hash functions. One is to calculate hash value and the other is to decide the step size of probing.
Separate Chaining: The other workaround is to have a linked list at each index. This allows us to not to worry too much about load factor. However, we do not want to make the linked lists become too full either.
- 【Data Structures】 10. Hashing—Mission Possible
- 【Data Structures】 12. Hashing, HashMap and HashSet in Java
- Data Structures
- data structures
- Data Structures
- data structures
- Data Structures
- Data Structures
- Data Structures - Immutable Data Structures
- Data Structures with Java—table(1)
- 【Data Structures】 11. HashTable—Simple Implementation
- Data Structures and Algorithms
- Data Structures Demystified
- Data Structures and Algorithms
- Disjoint-set Data Structures
- C & Data Structures
- Data Structures Demystified
- Data Structures, Suffix Trees
- stm32中如何进行位定义
- 网址
- Navicat11全系列激活(注册机)
- 自己封装的一个工具类
- Android学习笔记之从相册选取多张图片遇得到的一些问题
- 【Data Structures】 10. Hashing—Mission Possible
- ListView的item点击事件无响应的解决方法
- Linux上iptables防火墙的基本应用教程
- 服务端接受客户端上传的文件
- Android TextView字体大小调节框架QFontChangeLib介绍
- UI 界面性能优化(二)
- 结合apache和tomcat实现多域名、多端口绑定
- 检测当前鼠标点击的是UI的代码
- mysql---存储数据