算法原理 第五章

来源:互联网 发布:金税盘网络连接失败 编辑:程序博客网 时间:2024/06/05 13:05

RANDOMIZE-IN-PLACE

n = A.lengthfor i=1 to n    swap A[i] with A[RANDOM(i,n)]

RESERVIOR SAMPLING

The algorithm creates a “reservoir” array of size k and populates it with the first k items of S. It then iterates through the remaining elements of S until S is exhausted. At the ith element of S, the algorithm generates a random number j between 1 and i. If j is less than or equal to k, the jth element of the reservoir array is replaced with the ith element of S. In effect, for all i, the ith element of S is chosen to be included in the reservoir with probability k/i. Similarly, at each iteration the jth element of the reservoir array is chosen to be replaced with probability 1/k * k/i, which simplifies to 1/i. It can be shown that when the algorithm has finished executing, each item in S has equal probability (i.e. k/length(S)) of being chosen for the reservoir. To see this, consider the following proof by induction. After the (i-1)th round, let us assume, the probability of a number being in the reservoir array is k/(i-1). Since the probability of the number being replaced in the ith round is 1/i, the probability that it survives the ith round is (i-1)/i. Thus, the probability that a given number is in the reservoir after the ith round is the product of these two probabilities, i.e. the probability of being in the reservoir after the (i-1)th round, and surviving replacement in the ith round. This is (k/(i-1)) * ((i-1)/i)=k/i. Hence, the result holds for i, and is therefore true by induction.

// S has items to sample, R will contain the resultReservoirSample(S[1..n], R[1..k])  // fill the reservoir array  for i = 1 to k      R[i] := S[i]  // replace elements with gradually decreasing probability  for i = k+1 to n    j := random(1, i)   // important: inclusive range    if j <= k        R[j] := S[i]
0 0