EPI: hash tables

来源：互联网发布：js简易购物车合计代码编辑：程序博客网时间：2024/05/20 18:53

A hash table is a data structure used to implement the associative array, a structure that can map keys to values.

(1) Design a hash function for words in a dictionary

思路：The hash function should check every character in each word. It should give a large range of values, and should not let one character dominate (e.g., if we simply cast characters to integers and multiplied them, a single 0 would result in a hash code of 0). We would like a rolling hash function, one in which if a character is deleted from the front of the string and another added to the end.

int string_hash(const string &str, const int &modules){     const int MULT=997;     int val=0;     for(const char &c: str){           val=(val*MULT+c)%modules;     }     return val;}

(2) Longest subarray A[i:j] such that all elements in A[i:j] are distinct

Idea: Using two pointers to index the start and the end of the target subarray, respectively. A hash table exist[256] indicates the occurrences of characters in A. If a character appears again, then we should adjust the start index of the subarray, and count the length of the subarray.

    //let two pointers which index the start and the end of the substring    int lengthOfLongestSubstring(string s) {        bool exist[256]={false};        int i,j=0;        int maxlen=0;        for(i=0; i<s.length(); i++){ //end pointer prceeds            if(exist[s[i]]){  //revise the start pointer                 maxlen=max(maxlen, i-j); //count the length of the current substring                while(s[j]!=s[i]){                    exist[s[j]]=false;                    j++;                }                j++;            }            else exist[s[i]]=true;        }           maxlen=max(maxlen, i-j);        return maxlen;    }

(3) Minimum length subarray A[i:j] that covers Q [string version]

Idea: We can achieving a streaming algorithm by keeping track of latest occurrences of each item in Q as we process A. We use a doubly linked list L to store the last occurrence (index) of each keywords in Q, and hash table H to map each keyword in Q to the corresponding node in L.

//List loc keeps track of the latest occurrence [index in A] of each item in Q and append it to the back of it.//Hash table records each keyword in Q which have occurred in A, and map it to corresponding node in locpair<int,int> longestCoveringSubarray(istringstream &sin, const vector<string>& Q){unordered_map<string, list<int>::iterator> dict;list<int> loc; //last occurrence of each string in Qfor(int i=0; i<Q.size(); i++)dict[Q[i]]=loc.end();pair<int,int> res(-1,-1);int index=0; //indexing input string streamstring s;while(sin>>s){auto it=dict.find(s);if(it!=dict.end()){ //s is in Qif(it->second != loc.end()){ //delete the old indexloc.erase(it->second);}loc.push_back(index); //push back the current string indexdict[s]=--loc.end();}if(loc.size()==Q.size() && (res.first==-1 && res.second==-1) || inde-loc.front() < res.second-res.first){res={loc.front(), index};}++index;}return res;}

反思：很容易用hash table记录Q中每个keyword最近出现的位置(index)。但是无法知道他们出现的顺序。这里用list类似cache的LRU的管理方法，让最近出现的keyword总是在list的末尾，最老的keyword总是位于list头部。

(4) Minimum length subarray A[i:j] that covers Qsequentially [string version]

Idea: We solve this with a single pass over the elements of A. We maintain three data structure:

i. A hash map K which maps each element of Q to its index in Q

ii. An array L which maps j to the index of Q[j]' latest occurrence in A

iii. An array D which maps j to the length of the shortest subarray of A that ends at L[j] and sequentially covers Q[0, j]

pair<int,int> find_sequentially_covering_subarray(const vector<string>& A, const vector<string>& Q){unordered_map<string, int> K; //map keyword into its index in Qvector<int> L(Q.size(), -1); //latest occurrence of each keyword of Q in Avector<int> D(Q.size(), numeric_limits<int>::max()); //shortest length of subarray which ends at L[j]for(int i=0; i<Q.size(); i++) K[Q[i]]=i;pair<int,int> res(-1,-1);for(int i=0; i<A.size(); i++){auto it=K.find(A[i]);if(it!=K.end()){ //if A[i] is in Q, update D and Lint ind=it->second;// update D according to D[j]=D[j-1] + i-L[j-1]if(ind==0){D[0]=1;}else if(D[ind-1])!=numeric_limits<int>::max()){D[ind]=D[ind-1]+i-L[ind-1]; }L[ind]=i; //L keeping track of the latest occurrenceif(ind==Q.size()-1 && D.back()<res.second-res.first){res={i-D.back+1, i};}}} return res;}

(5) Minimum length subarray A[i:j] that covers Q [char version]

Two pointers increase from left to right. One is for the start, and the other is for the end. Two maps need to maintained:

i. A hash map needToFind maps each character of Q to the # of its occurrence in Q

ii. A hash map hasFound maps each character of Q to the # of its occurrence in A

if the total number of characters in hasFound == Q.size(), then we can adjust the start pointer to the proper place, and count the length of the substring

class Solution {public:    string minWindow(string S, string T) {        int needToFound[256]={0};        int hasFound[256]={0};                int minWindowlen=INT_MAX;        int minWindowBeg=-1;        int minWindowEnd=-1;        int count=0;                for(int i=0; i<T.length(); i++)            needToFound[T[i]]+=1;                for(int beg=0, end=0; end<S.length(); end++){            if(needToFound[S[end]]==0)                continue;            hasFound[S[end]]+=1;            if(hasFound[S[end]]<=needToFound[S[end]])                count++;            if(count==T.length()){                while(hasFound[S[beg]]==0||hasFound[S[beg]]>needToFound[S[beg]]){                    if(hasFound[S[beg]]>needToFound[S[beg]])                        hasFound[S[beg]]--;                    beg++;                }                int windowlen=end-beg+1;                if(windowlen<minWindowlen){                    minWindowlen=windowlen;                    minWindowBeg=beg;                    minWindowEnd=end;                }                //count=0;            }        }        if(minWindowBeg==-1) return "";        string res=S.substr(minWindowBeg, minWindowlen);        return res;    }};

0 0