Basics of Hash Table--Data Structure

来源:互联网 发布:淘宝百合花名妆真假 编辑:程序博客网 时间:2024/06/06 06:30

Intro:

API: Python: Dict; JAVA: HashMap

Applications: File Systems; Password Verification; Store Optimization

IP Address: 

Main Loop

log - array of log lines(time,IP)
C - mapping from IPs to counters
i - first unprocessed log line
j - first line in current 1h window

i0

j0
C←∅
Each second

UpdateAccessList(log,i,j,C)

UpdateAccessList(log,i,j,C)

while log[i].timeNow():

    C[log[i].IP]C[log[i].IP] + 1

    ii+1

while log[j].timeNow()3600:

    C[log[j].IP]C[log[j].IP]1

    jj+1

AccessedLastHour(IP,C)

return C[IP]>0


Direct Addressing

Need a data structure forC
There are 232different IP(v4) addresses

Convert IP to 32-bit integer
Create an integer array
A of size 232

Use A[int(IP)]asC[IP


int(IP)

return IP[1]·224+IP[2]·216+IP[3]·28+IP[4]

UpdateAccessList(log,i,j,A)

while log[i].timeNow():

    A[int(log[i].IP)]A[int(log[i].IP)] + 1

    ii+1

while log[j].timeNow()3600:

    A[int(log[j].IP)]A[int(log[j].IP)]1

    jj+1


AccessedLastHour(IP)

return A[int(IP)]>0


Asymptotics

UpdateAccessListisO(1)per log line

AccessedLastHourisO(1)
But need 232memory even for few IPs

IPv6: 2128won’t fit in memory

In general: O(N)memory,N=|S


List-based Mapping:

Direct addressing requires too much memory

Let’s store only active IPs
Store them in a list
Store only last occurrence of each IP 

Keep the order of occurrence 


UpdateAccessList(log,i,L)

while log[i].timeNow():

    log_lineL.FindByIP(log[i].IP)

    if log_line=NULL:

        L.Erase(log_line)

    L.Append(log[i])

    ii+1

while L.Top().timeNow()3600:

    L.Pop()


AccessedLastHour(IP,L)

return L.FindByIP(IP)=NULL


Asymptotics

n is number of active IPs
Memory usage is
Θ(n)
L.Append,L.Top,L.PopareΘ(1)

L.FindandL.EraseareΘ(n)

UpdateAccessListisΘ(n)per log line

AccessedLastHourisΘ(n


Encoding IPs

Encode IPs with small numbers
I.e. numbers from 0 to 999
Different codes for currently active IPs 


Hash Function

De nition

For any set of objectsSand any integer
m >0, a functionh:S→ {0,1,...,m1}is called a hash function.

De nition

m is called thecardinalityof hash function h.


Desirable Properties

h should be fast to compute.

Different values for different objects.

Direct addressing withO(m)memory.

Want small cardinalitym.

Impossible to have all different values ifnumber of objects|S|is more than m.


Collisions

De nition

When h(o1) = h(o2)ando1̸=o2, this is acollision.



Map

Store mapping from objects to other objects:

   Filename location of the file on disk

   Student ID student name
   Contact name
contact phone number

Definition

Map from S to V is a data structure with methodsHasKey(O),Get(O),Set(O,v),whereOS,vV.


h :S→ {0,1, . . . ,m1}
O,OS
v
,vV
A
array ofmlists (chains) of pairs(O,v)

HasKey(O)

L A[h(O)]
for (O,v)inL:

  if O==O:

     return true

return false


Get(O)

L A[h(O)]
for (O,v)inL:

   if O==O:

      return v

return n/a


Set(O,v)

L A[h(O)]

for pinL:

   if p.O==O:

      p.vv

      return 

L.Append(O,v)


Set

De nition

Set is a data structure with methodsAdd(O),Remove(O),Find(O).

Examples

IPs accessed during last hourStudents on campus
Keywords in a programming language


h :S→ {0,1, . . . ,m1}
O,OS
A
array ofmlists (chains) of objectsO

Find(O)

L A[h(O)]

for OinL:

  if O==O:

    return true

return false


Add(O)

L A[h(O)]

for OinL:

  if O==O:

    return

L.Append(O)


Remove(O)

if not Find(O):

  return

L A[h(O)]

L.Erase(O)


Hash Table:

Definition

An implementation of a set or a map usinghashing is called a hash table.


Programming Language:

Set:
  unordered_set inC++

  HashSet in Java
  set in Python

Map:
  unordered_map inC++

  HashMap in Java
  dict in Python 

 

Conclusion

Chaining is a technique to implement ahash table

Memory consumption isO(n+m)

Operations work in timeO(c+1)

How to make bothmandcsmall? 




原创粉丝点击