概率和随机数经典面试问题：拒绝采样，蓄水池抽样，洗牌问题和随机01问题

来源：互联网发布：linux svn ignore 编辑：程序博客网时间：2024/06/06 07:41

一、拒绝采样：

已知rand7()[1..7]，求产生rand10()[1..10]

基本思路大家都知道的，就是call rand7()两次然后进行拒绝采样，因此大部分人第一下想到的算法如下：

int rand10() {  int row, col, idx;  do {    row = rand7();    col = rand7();    idx = col + (row-1)*7;//选择第row行第col列的数字，每行7个数字  } while (idx > 40);  return 1 + (idx-1)%10;}

但问题是有没有必要idx为41~49之间的时候下一次重新调用两次rand7()，答案是没必要，我们可以采用下述算法来进行优化

int rand10Imp() {  int a, b, idx;  while (true) {    a = rand7();    b = rand7();    idx = b + (a-1)*7;    if (idx <= 40)      return 1 + (idx-1)%10;    a = idx-40;    b = rand7();    // get uniform dist from 1 - 63    idx = b + (a-1)*7;    if (idx <= 60)      return 1 + (idx-1)%10;    a = idx-60;    b = rand7();    // get uniform dist from 1-21    idx = b + (a-1)*7;    if (idx <= 20)      return 1 + (idx-1)%10;  }}

那么接着我们计算方法1和方法2的rand7调用次数的数学期望来证明我们的优化是有效的，期望方程如下：

方法1：

E(# calls to rand7) = 2 * (40/49) +                       4 * (9/49) * (40/49) +                       6 * (9/49)2 * (40/49) +                       ...                      ∞                    = ∑ 2k * (9/49)k-1 * (40/49)                      k=1                    = (80/49) / (1 - 9/49)2                    = 2.45

方法2：

E(# calls to rand7) = 2 * (40/49) +                       3 * (9/49) * (60/63) +                       4 * (9/49) * (3/63) * (20/21) +                       (9/49) * (3/63) * (1/21) *                       [ 6 * (40/49) +                         7 * (9/49) * (60/63) +                        8 * (9/49) * (3/63) * (20/21) ] +                      ((9/49) * (3/63) * (1/21))2 *                       [ 10 * (40/49) +                         11 * (9/49) * (60/63) +                        12 * (9/49) * (3/63) * (20/21) ] +                      ...                    = 2.2123

推广一下，如果需要randN[1..N]生成randM[1..M]怎么做

int randM(int n, int m) {   int res = 0;   int count = 1;   int tmp = n - 1;   while (tmp < m) {     tmp = tmp * n + n - 1;//首先判断出m可以由几位的n进制数组成     count++;   }   int times = (tmp / m) * m;       do {     res = count > 1 ? (res % m) : 0;     int offset = res? 1 : 0;     for (int i = 0; i < count - offset; ++i) {       res = res * n + randN() - 1;     }   } while (res >= times);    return 1 + res % m;}

二、蓄水池抽样

思路很简单，就是首先创建一个长度为K的buffer，然后先填满这个buffer，然后对新来的数字利用其编号产生随机数，若随机数范围在[0..K-1]则替换相应的buffer里对应下标的数字，否则抛弃当前新来的数字继续下一个数字，直至结束。

程序如下：

// An efficient program to randomly select k items from a stream of items #include <stdio.h>#include <stdlib.h>#include <time.h> // A utility function to print an arrayvoid printArray(int stream[], int n){    for (int i = 0; i < n; i++)        printf("%d ", stream[i]);    printf("\n");} // A function to randomly select k items from stream[0..n-1].void selectKItems(int stream[], int n, int k){    int i;  // index for elements in stream[]     // reservoir[] is the output array. Initialize it with    // first k elements from stream[]    int reservoir[k];    for (i = 0; i < k; i++)        reservoir[i] = stream[i];     // Use a different seed value so that we don't get    // same result each time we run this program    srand(time(NULL));     // Iterate from the (k+1)th element to nth element    for (; i < n; i++)    {        // Pick a random index from 0 to i.        int j = rand() % (i+1);         // If the randomly  picked index is smaller than k, then replace        // the element present at the index with new element from stream        if (j < k)          reservoir[j] = stream[i];    }     printf("Following are k randomly selected items \n");    printArray(reservoir, k);} // Driver program to test above function.int main(){    int stream[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};    int n = sizeof(stream)/sizeof(stream[0]);    int k = 5;    selectKItems(stream, n, k);    return 0;}

以下证明该做法产生的K个采样是完全符合随机的要求，

假设在第j个数字到来的时候，那么对于第j个想要进入buffer的数字首先需要K/j的概率，并且接下来的N-j个数字都不能替换第j个数字，意味着第j个数字留在

buffer的总概率为K/j * j/(j+1)* (j+1)/(j+2)...N-1/N = K/N，注意我们这里的j是大于K的

那么对于j小于K的数呢？其实证明更简单，就只要证明从K+1的数字开始都不被替代，就得等于

K/(K+1) * (K+1)/(K+2)...N-1/N = K/N

因此我们证明了无论何种情形留在buffer中的数字的概率都是K/N符合随机性的要求。

三、洗牌

这个是更加经典的随机数程序了，详细不多说，程序如下：

// C Program to shuffle a given array #include <stdio.h>#include <stdlib.h>#include <time.h> // A utility function to swap to integersvoid swap (int *a, int *b){    int temp = *a;    *a = *b;    *b = temp;} // A utility function to print an arrayvoid printArray (int arr[], int n){    for (int i = 0; i < n; i++)        printf("%d ", arr[i]);    printf("\n");} // A function to generate a random permutation of arr[]void randomize ( int arr[], int n ){    // Use a different seed value so that we don't get same    // result each time we run this program    srand ( time(NULL) );     // Start from the last element and swap one by one. We don't    // need to run for the first element that's why i > 0    for (int i = n-1; i > 0; i--)    {        // Pick a random index from 0 to i        int j = rand() % (i+1);         // Swap arr[i] with the element at random index        swap(&arr[i], &arr[j]);    }} // Driver program to test above function.int main(){    int arr[] = {1, 2, 3, 4, 5, 6, 7, 8};    int n = sizeof(arr)/ sizeof(arr[0]);    randomize (arr, n);    printArray(arr, n);     return 0;}

同样我们来证明一下其随机性，

我们需要证明的其实就是产生的排列其概率为1/N!，

那么首先我们看第一个数字被选到的概率为1/N，第二个数字被选到的概率为1/(N-1)，第三个数字被选到的概率为1/(N-2)，以此类推

那么一整个排列形成的概率就是等于第一个数字概率乘以第二个数字概率乘以第三个数字的概率直到最后一个数字的概率即为1/N!

四、已知有一个随机函数rand_0_and_1_with_p()，它能以概率p产生0，以概率1 - p产生1，只使用该函数，设计一新的随机函数，要求以等概率产生1和0。

我们知道，运行rand_0_and_1_with_p()函数一次，那么P(0) = p, P(1) = 1 - p。那么如果运行两次的话，P(0 and 1) = p(1 - p),P(1 and 0) = p(1 - p)，

这样就出现了等概率，所以我们可以如下实现：

int rand_0_and_1_with_equal_prob() {    while (1) {      int tmp1 = rand_0_and_1_with_p();      int tmp2 = rand_0_and_1_with_p();      if (tmp1 == 1 && tmp2 == 0) {        return 1;      } else if (tmp1 == 0 && tmp2 == 1) {        return 0;      }    } }

将上述问题扩展一下，已知有一个随机函数rand_0_and_1_with_p()，它能以概率p产生0，以概率1 - p产生1，只使用该函数，设计一新的随机函数，要求以等概率1/n产生1到n之间的随机数，就是(1/2)^logN=1/N，因此得到下面的程序

int rand_1_to_n_with_equal_prob(int n) {   int k = 0;   while (n) {     k++;     n >>= 1;   }   do {     int res = 0;     for (int i = 0; i < k; ++i) {       res |= rand_0_and_1_with_equal_prob()<< i;     }   } while (res >= n);   return res + 1; }

0 0