Algorithm(I)-week1

来源：互联网发布：哈密尔顿回路c语言编辑：程序博客网时间：2024/05/18 18:55

I tried to take the Algorithm last summer, but failed in finishing that because of weak-willed and unfamiliarity on Java, which makes me not quite confident with myself in seeking internships. Recently, I might not get the offer from Cambridge, so I need to try harder to improve myself, especially in self-learning part.

I will write down the basic knowledge, some ideas about exercise and programming, and most importantly, my answer to the interview question in each week's study of Algorithm, in hope of better mastering.

-------------------------------------------------------------------------------------

The first week focuses on Algorithm design approach (build the model, analysis the complexity, improve or prove there is no lower bound and repeat the last two steps until you can prove there is no other way to be better). And the professor uses union-find (a single algorithm) , leading us to practice such a process.

First determine the data structure is array to store all ids for each point(which is identified as index) , and the implementation is that p and q are connected iff they have the same id. The first idea is to use "quick find" - check if array[p] == array[q] for Find(), check all array with array[p], and change it into array[q] for Union(). The complexity for n set is N^2.An alternative method is "quick-union" (which is also called lazy approach) - Find() is the same, but Union() will set id of p's root to the id of q's root, and the complexity is also N^2.

A better way lies in the weighted-quick-union, which keep track of size of each tree. Always connect the root of a smaller tree to a larger one, which could be proved more efficient with N*logN (hints: worst case is that every time Size(Tree1)== Size(Tree2), and the depth is at most logN). A more modified one is with path compression, which sets the id of each examined node to point to that root during each FindRoot(), reducing the time of finding roots (Personally, I think a recursive way would be preferred, but the professor uses id[i]=id[id[i]]), use id within a gap instead of a straightforward root) .

A new knowledge for me in the analysis of Algorithm is called doubling hypothesis, similar to log-log scale plotting.

Nothing special about the exercise (although requiring great patience and carefulness). A specific question I'd like to share is about giving an array and check if it could be the id[] of an union-find algorithm. And the criteria are:

1Height of forest <= lg N

2Size of tree rooted at parent of n < twice the size of tree rooted at n

3contains a cycle

I spent a lot of time on programming assignments in spite of mastering union-find long long ago, basically for unfamiliarity on Java and the testing system. And I have made some small mistakes - one is that I forgot that the index starting from 0 or 1(for 2D array), and the other is that I mistakenly connect the p to q'root, instead of p's root.

-------------------------------------------------------------------------------------

Then the most important part: interview questions.

Question 1

Social network connectivity. Given a social network containingN members and a log file containingM timestamps at which times pairs of members formed friendships, design an algorithm to determine the earliest time at which all members are connected (i.e., every member is a friend of a friend of a friend ... of a friend). Assume that the log file is sorted by timestamp and that friendship is an equivalence relation. The running time of your algorithm should beMlogN or better and use extra space proportional toN

Answer: By using weighted union-find data structure, only two arrays of N - one to store the parent of one (finally the root), and the other to store the size of each tree. Start from the shortest time, and model the forming friendship as union, after each forming, using find method to test if all are connected. The complexity of union is constant, and of find is logN, at most there would be M times. The initial part takes O(N), so the total complexity could be Mlog(N), and by using path compression, the find() time could be even shorter than logN.

Question 2

Union-find with specific canonical element. Add a methodfind() to the union-find data type so thatfind(i) returns the largest element in the connected component containingi. The operations,union(),connected(), andfind() should all take logarithmic time or better.

For example, if one of the connected components is{1,2,6,9}, then the find() method should return 9 for each of the four elements in the connected components.

Answer: add a new array (largestEle[]) to remember largest element for each 'root'. when union( ), compare the largestEle1 and largestEle2, and set the larger one on the current root, this will not change the complexity of union() a lot. And the find( int i ) method will just to find the root of the i, and print out largestEle[i].

Question 3

Successor with delete. Given a set ofN integersS={0,1,...,N−1}and a sequence of requests of the following form:

• Remove x from S

• Find thesuccessor of x: the smallest yin Ssuch that y≥x.

design a data type so that all operations (except construction) should take logarithmic time or better.

Answer: Remove(x) is similar to union(x,x+1), and the successor(x) will be equal to findRoot(x).

Amazing ideas!! But the hints say that use the modification union-find in question2, not quite understand for now.

personally, I prefer to just use a boolean array to record one is in S or not (complexity is constant) to achieve Remove( ) . Find( ) method will use binary search (logarithmic time).

Question 4

Union-by-size. Develop a union-find implementation that uses the same basic strategy as weighted quick-union but keeps track of tree height and always links the shorter tree to the taller one. Prove algN upper bound on the height of the trees forN sites with your algorithm.

Answer: use height[] instead of size[].When union( ), set the height[newRoot] = max{ height[newRoot], height[anotherRoot]+1 }, and newRoot is dependent on the height of two root - the larger one will be the newRoot.

Proof: the worst case is that every layer only have 2 sites, for if not, there would be no call for union( ). Thus, the upper bound is lgN.

Question 1

3-SUM in quadratic time. Design an algorithm for the 3-SUM problem that takes time proportional toN^2in the worst case. You may assume that you can sort theN integers in time proportional to N^2or better.

Answer: the preparation is to sort the N integers and build an array to store the sum of any two elements(both the column and row are from small to large), using O(N^2). And use a loop to go through the first element, in which starts from the left-bottom, if the number in array plus the index of loop is smaller than the sum, move right; if is larger, move upward; if is equal, count++. When it reach the right-top or find the equality, continue for next index of the loop. The total complexity for searching would be N or smaller, so whole time is proportional to N^2.

Question 2

Search in a bitonic array. An array isbitonic if it is comprised of an increasing sequence of integers followed immediately by a decreasing sequence of integers. Write a program that, given a bitonic array of

N distinct integer values, determines whether a given integer is in the array.

•Standard version: Use ∼3lgN
compares in the worst case.

•Signing bonus: Use ∼2lgN
compares in the worst case (and prove that no algorithm can guarantee to perform fewer than∼2lgN compares in the worst case).

Answer:

For standard version, use binary search to find the max element, which divides the ascending and descending sequence. And for each sequence, use another binary search. So, three times of binary search, and the time is 3lgN.

For signing bonus, we should not search for max element. I have learned from others' answers that the key insight is that if you know a[low]<= key< a[high] || a[low]>key>=a[high], you can use binary search even if it is a bitonic array (using a downwards parabola, you will easily figure it out. The main point is that you can limit the possible key in a sorted part). Then in main part of binary search, if key<a[mid], we can binary search two part(all with the limitation); else, we could at least desert the part from low to middle or middle to high, which is dependent on middle's relative position with peak.But I am still confused with the complexity, I though it would be (lgN)^2, since it calls binary search in binary search. Anyway, amazing ideas.

Question 3

Egg drop. Suppose that you have an N -story building (with floors 1 through

N)and plenty of eggs. An egg breaks if it is dropped from floor

T or higher and does not break otherwise. Your goal is to devise a strategy to determine the value ofTgiven the following limitations on the number of eggs and tosses:

•Version 0: 1 egg, ≤Ttosses.

•Version 1: ∼1lgNeggs and ∼1lgNtosses.

•Version 2: ∼lgTeggs and ∼2lgTtosses.

•Version 3: 2eggs and ∼2√Ntosses.

•Version 4: 2eggs and ≤c√Ttosses for some fixed constant c.

Answer:

version 0: from 1 to T, until the eggs break

version 1: binary search

version 2: start from 2nd floor, if not break, double the size. Then the worst case is that there would be lg(T-n) broken eggs. And time for resize if lgT and time for specific finding is lg(T-n), where n>=1.

version 3:divide the stories into sqrt(N) group, each with sqrt(N) stories. Start from sqrt(N), and then 2sqrt(N)…., until the first broken egg at m*sqrt(N). Then starting from the (m-1)*sqrt(N)+1 until find the T.

version 4:start form 1st floor, then Floor 1+2, ….until Floor 1+2+..+k to be broken. Then try from Floor 1+2+..+k-1+1 until finding T. The time for first trying is k, and second one is k-1. Since 1+2+..+k = (k+1)*k/2 >=T, k>=sqrt(2N), the number of tosses is 2k-1, which is nearly 2*sqrt(2N), so the c is 2*sqrt(2) (how to deal with >=)

The key point for this question is how to divide(resize) the check period

0 0