二分查找-两已排序数组中找中位数二题

来源:互联网 发布:朋友借身份证开淘宝店 编辑:程序博客网 时间:2024/05/16 08:36

第一题来自于《算法导论》第九章习题 9.3-8. 已知两个已排序数组X[n], Y[n](假设升序),问在时间O(lgn)内找到全部2n个数中的中位数。

给了提示时间O(lgn),那么必定使用二分查找。这道题的“梗”在于如何处理两个已排序数组。我们有总共2n个数,偶数,那么全局的中位数来自于两个数a,b,在排好序的2n个数中,它们以上(比它们大)和以下(比它们小)应该各有n-1个数。令a > b,假设a==X[i], 即数组X 中,a以下有i个数小于a。对于b,有两个可能来源,X[i-1] 或者Y[n-1-i],这里Y[n-1-i]是数组Y中比a小的数中最大的一个。

综上所述,我们的二分查找逻辑如下:首先在数组X中对于区间[v,u), 由i=(v+u)/2寻找X[i] ,对于j=n-1-i, 检查 if Y[j+1] > X[i] && Y[j] < X[i],否则通过增大或者缩小i来反向改变j。如果这样的a不在X[]中,则对于数组Y重复以上处理。

实现代码如下,注意数组边界情况,比如i==0 或者i==n-1

bool findmediansingle(int* A, int* B, int n, double& res){ // find total median in array A    int u=n, v=0;    while(v<u){ //[v,u)        int i = (v+u)/2;        int j = n-1-i; //index in B[]        if(A[i] <= B[j]){ //we need to find the middle two elements among the total merged array            if(j==0 || A[i] >= B[j-1]){ //all elements less than A[i] and B[j] adds up to mid                int low = A[i]; //since we focus binary cursor on i, A[i] is fixed as floor of the two middle                int high = B[j];                if(i<n-1 && A[i+1]<B[j]){ //A[i+1] is closer to A[i] then B[j]                    high = A[i+1];                }                res = (double)(low + high)/2;                return true;            }else{                v=i+1; //enlarge i to reduce j                continue;            }        }else{ //A[i] > B[j]            if(j==n-1 || A[i] <= B[j+1]){                int high = A[i]; //now we fix A[i] as ceil of the two middle                int low = B[j];                if(i>0 && B[j] < A[i-1]){ //A[i-1] is closer to A[i] than B[j]                    low = A[i-1];                }                res = (double)(low+high)/2;                return true;            }else{                u = i;                continue;            }        }    }    return false;}
double findmedian(int* A, int* B, int n){    double res = 0.0;    if(!findmediansingle(A, B, n, res)){        findmediansingle(B, A, n, res);    }    return res;}

测试数据:

{1}, {1]

{1,2}, {3,4}

{1,3}, {2,4}

{1,4}, {2,3]

-------------------------------------------------我是分割线------------------------------------------

很自然,第二题就是长度不同的两个数组,A[m]和B[n],求全部m+n个数中的中位数。 算法原理一样,我们需要看看会多出哪些边界情况。

首先,m+n可能是奇数,这样中位数就是所有数中第(m+n)/2个,比它小的数有(m+n)/2个,直接找到它,不需要考虑下边界。

其次,在m==n情况下,对于j=n-1-i, 由于i属于区间[0,n),则j也必定属于区间[0,n),即此时j的取值一定是合法的。但是对于m>>n, 对于j=mid-1-i (mid = (m+n)/2),此时完全有可能出现j<0 或者j>n-1的数组越界情况,所以这里我们需要特别小心j的处理。

具体代码实现如下:

bool findmediansingle(int *A, int m, int *B, int n, double& res, int tag){    if(m==0 && n==0){        return false;    }else if(n==0){        res = tag==1 ? A[m/2] : (double)(A[m/2] + A[m/2 - 1])/2;        return true;    }else if(m==0){        res = tag==1 ? B[n/2] : (double)(B[n/2 -1] + B[n/2])/2;        return true;    }    int u=m, v=0, i=0;    int mid = (m+n)/2;    if(tag==0){        while(v<u){ //[v,u)            i = (v+u)/2;            int j=(mid-1-i); //index in B[]            if(j<-1){                u=i; //reduce k to enlarge j            }else if(j==-1){ //it means all elements in A[] less than A[i] add up to mid                if(B[0] >= A[i]){ //B[0] is above A[i], out of mid elements below A[i], A[i] is floor of the middle two                    int high = A[i];                    int low = i>0 ? A[i-1] : 0;                    res = (double)(high + low)/2;                    return true;                }                break;            }else if(j>n-1){ //enlarge k to reduce j                v=i+1;            }else if(A[i] <= B[j]){ //we need to find the middle two elements among the total merged array                if(j==0 || A[i] >= B[j-1]){ //all elements less than A[i] and B[j] adds up to mid                    int low = A[i]; //since we focus binary cursor on i, A[i] is fixed as floor of the two middle                    int high = B[j];                    if(i<m-1 && A[i+1]<B[j]){ //A[i+1] is closer to A[i] then B[j]                        high = A[i+1];                    }                    res = (double)(low + high)/2;                    return true;                }else{                    v=i+1; //enlarge i to reduce j                    continue;                }            }else{ //A[i] > B[j]                if(j==n-1 || A[i] <= B[j+1]){                    int high = A[i]; //now we fix A[i] as ceil of the two middle                    int low = B[j];                    if(i>0 && B[j] < A[i-1]){ //A[i-1] is closer to A[i] than B[j]                        low = A[i-1];                    }                    res = (double)(low+high)/2;                    return true;                }else{                    u = i;                    continue;                }            }        }        return false;    }else{        while(v<u){ //[v,u)            i = (v+u)/2;            int j=(mid-1-i); //index in B[]            if(j<-1){                u=i; //reduce k to enlarge j            }else if(j==-1){ //it means all elements in A[] less than A[i] add up to mid                if(B[0] >= A[i]){ //B[0] is out of the mid elements below A[i], A[i] is the middle of all                    res = (char)A[i];                    return true;                }                break;            }else if(j>n-1){ //enlarge k to reduce j                v=i+1;            }else if(A[i] >= B[j]){ //1.A[i] and B[j] are middle two; 2.max(A[i], B[j]) is middle of all                if(j==n-1 || A[i] <= B[j+1]){                    res = (double)A[i];                    return true;                }else{                    u=i; //reduce i                }            }else{                if(i==m-1 || B[j] <= A[i+1]){                    res = (double)B[j]; //we need max(A[i], B[j])                    return true;                }else{                    v=i+1; //enlarge i                }            }        }        return false;    }}double findmedian(int* A, int m, int* B, int n){    double res = 0.0;    int tag=0;    if((m+n)%2 == 1)      tag=1; //1 for odd total count, 0 for even    if(!findmediansingle(A,m,B,n,res,tag)){        findmediansingle(B,n,A,m,res,tag);    }    return res;}


测试数据:

{1,3,5,8}, {2,4,6,7}

{1,1}, {2,3,4,5}

{}, {1,2,3,4}

{1,2,3}, {}

{1}, {1}

{1,2,3,5,6}, {4}

{1}, {2,3,4}

{1}, {2,3,4,5}

{1,2,3,4}, {5}

{2}, {1,3,4}

{3}, {1,2,4}

{1,2,3}, {4,5}

{6,7,8}, {1,2,3,4,5}


小结

这两道题使用二分查找的思路比较明显,这里需要强调的是:想要写出bug-free的代码,必须事先设计出所有完整、全覆盖的测试案例(数据),跟据这些测试案例,再逐步覆盖到所有算法分支。

0 0
原创粉丝点击