LeetCode题目笔记(三) -- Median of Two Sorted Arrays

来源:互联网 发布:js navigator.appname 编辑:程序博客网 时间:2024/06/07 18:22

Problem: Median of Two Sorted Arrays (979/5373 -- 18%)

Problem Description:There are two sorted arrays A and B of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n)).Struggled to write bug-free code within 20 mins. Still have bugs. Take care of corner cases.

这是一道有点令我抓狂的问题。实际上,现在这道题也没有通过online judge。但是,我打算先把这道题先记下来,日后update。难怪这道题只有18%的通过率。


Step I: Ask Questions

Make sure if they are numbers, strings or other? Are they both ascending or descending, or other cases? What is median?

Step II: Describe Approach to the problem-->Algorithm-->Algorithm Analysis

这个题首先要弄清楚median,median是一组有序数列中间的元素,如果数列长度为偶数,median是中间的两个元素的average。首先,这道题如果不考虑复杂度,就是一个mergeSort的merge的过程,设置两个指针指向两个数组的起始位置,然后按大小依次向后移动指针,就可以得到merge后的数组。这里,只需到找到第m+n/2个元素就可以,但是这样的时间复杂度仍然为O(m+n)。

如何可以完成O(log(m+n))? log的复杂度,让我们直观的想到binary search。可是这个问题如何实现binary search?

Binary Search是在一个有序的数列里search一个元素,首先将这个元素与中间的元素进行比较,在比较的结果除了相等以外,有大于和小于两种情况,这样通过一次比较,就可以将一个在n个元素中寻找1个元素的问题,缩小为在n/2个元素里search 1个元素的问题,由于两个数比较的大小概率基本相等,所以这样的search最大程度上剔除了不合适的元素。前两天看了刘未鹏的一篇文章,讲了有关排序问题与信息论的关系,很不错,推荐看下http://blog.csdn.net/pongba/article/details/2544933

而这个问题是,我们不知道哪个是这个中间元素。但是我们可不可以引用相同的思想来找到这个median呢? 每一次剔除一半的元素,然后通过log(m+n)次比较来找到它呢。

每一次剔除一半的元素,想到的是找A和B分别的median,如果通过什么方法能够剔除一半的A,一半的B,那我们就达成任务了。

如果我们比较median(A)和median(B)会有什么结果? median(A)<median(B),median(A)>median(B),median(A)=median(B). 

1)如果median(A) = median(B). 这很好,因为,不管前面和后面的元素大小如何,median(A)前面有m/2+n/2个元素,不论奇偶情况,我们都可以很快的找到median

2)如果median(A)<median(B)。 类似上面的推断,我们发现,median(A)一定小于median,median(A)<median(B), 那么在B中比median(A)小的数少于n/2,那么在merge之后的数组里,比median(A)小的数少于m/2+n/2,因此median(A)<median; 同理推得,median(B)一定大于median。

经过这一次比较,我们得出结论,median在median(A)和median(B)之间,对应得,我们可以肯定,median肯定大于A[0]~median(A),肯定小于median(B)-B[n-1].

3) 如果median(A) >median(B)。和2)是相同的情况。

我们通过第一次比较,可以剔除m+n/2个元素,达到了目的,那么第二次比较呢?

经过第一次比较,假设是第二种情况,我们知道,至少A[0]~median(A)是在median前面,那么,这个问题转化成了,在A1=median(A)~A[m-1]和B1=B[0]~median(B)这(m+n)/2数里面,找第n/2小的数, 这里出现了问题。这个sub problem和之前的问题不一样了,不是找median了,我有两种选择,1是在两个sub array里面分别找n/4大元素,然后和第一次比较一样,

2是还是比较这两个sub array的median。第一种情况需要考虑是否n/4小于m/2,但是如果仔细考虑,是可以继续的。但是,这样的话,算法复杂度是O(log(m+n))么?我们很显然,第二次没有剔除到一半的元素,所以还是尝试第2种选择。

median(A1)和median(B1)比较,如果median(A1)<median(B1), median在median(A1)~A[m-1]和B[0]~median(B1)之间,那么至少A[0]~median(A1)小于median, median(B1)~B[n-1]大于median

如果median(A1)>median(B1), 那么至少A[0]~median(A)和B[0]~median(B1)小于median,median在median(A)~median(A1)和median(B1)~median(B)之间,median[A1]~A[m-1]和median(B)~B[n-1]大于median

无论那种情况,median的区间都被缩小到了(m+n)/4长度的数组里了。

这样经过log(m+n)次比较,我们一定可以找到将median的区间缩小到长度为1的数组,也就是找到它。

有几个问题需要考虑:

1. 奇偶。因为奇偶长度时,median的定义不同,因此我们需要定义不同的函数来handle

2. 假设我们把区间分别缩小为1,这两个元素那一个是median,还是他们的average?我们需要有一个counter,count这两个数前面有多少个数。

算法:

1. 如果m+n是奇数,找merge后的第(m+n)/2 + 1个元素;如果m+n是偶数,找merge后的第(m+n)/2和第(m+n)/2+1个元素

2. 定义两个量start,end分别指向当前median区间的上下界。定义lastStart和lastEnd,记录上一次的区间,以防止start>(m+n)/2.

while( subA.length()>1 && subB.length()>1 && start < (m+n)/2 )

medianA = subA.length()/2, medianB = subB.length()/2, compare subA[medianA] with subB[medianB];

if( subA[medianA] > subB[medianB] ) start = start + medianB; end = end-medianA:

else start = start + medianA; end = end - medianB;

update array. record last start and last end.

3. if start > (m+n)/2, start = lastStart, end = lastEnd,then merge the two sub array, until start = (m+n)/2

    else if ( subA.length() <=1 ) or (subB.length()<=1 )


//FALSE CODE   174/2098 

class Solution {
public:
    double findMedianSortedArraysEven( int A[], int m, int B[], int n){
        double median;
        int midA = m/2;
        int midB = n/2;
        int subLenA = m/2;
        int subLenB = n/2;
        int start = 0; //lowerBand id of median
        int end = (m+n)-1; //upperBand id of median
        
        
        while ( subLenA > 0 || subLenB > 0 )
        {
            if( A[midA] >= B[midB] )
            {
                start = start + subLenB;
                if( start > (m+n)/2-1 )
                {
                    start = start - subLenB;
                    return findMedianPair( A, midA, m, B, midB, n, start );
                }
                end = end - subLenA;
                subLenA = subLenA/2;
                subLenB = subLenB/2;
                midA = midA - subLenA;
                midB = midB + subLenB;
            }
            else
            {
                start = start + subLenA;
                if( start > (m+n)/2-1 )
                {
                    start = start - subLenA;
                    return findMedianPair( A, midA, m, B, midB, n, start );
                }
                end = end - subLenB;
                subLenA = subLenA/2;
                subLenB = subLenB/2;
                midA = midA + subLenA;
                midB = midB - subLenB;
            }
        }
        
        //if goes to here, subLenA == 1, subLenB == 1, then
        median = ( A[midA] , B[midB] )/2.0;
        
        
        return median;
    }
    double findMedianSortedArraysOdd( int A[], int m, int B[], int n ){
        double median;
        int midA = m/2;
        int midB = n/2;
        int subLenA = m/2;
        int subLenB = n/2;
        int start = 0; //lowerBand id of median
        int end = (m+n)-1; //upperBand id of median
        
        //simply case, if A[midA] == B[midB] , return A[midA], because before A[midA],
        //there are m/2 elements in A, and before B[midB], there are n/2 in B.
        //Since we want to return the (m+n)/2 + 1 th element, thus we just return A[midA]
        if( A[midA] == B[midB] )
            return A[midA];
        
        while ( subLenA > 0 || subLenB > 0 )
        {
            if( A[midA] >= B[midB] )
            {
                start = start + subLenB;
                if( start > (m+n)/2 )
                {
                    start = start - subLenB;
                    return findMedianElement( A, midA, m, B, midB, n, start );
                }
                end = end - subLenA;
                subLenA = subLenA/2;
                subLenB = subLenB/2;
                midA = midA - subLenA;
                midB = midB + subLenB;
            }
            else
            {
                start = start + subLenA;
                if( start > (m+n)/2 )
                {
                    start = start - subLenA;
                    return findMedianElement( A, midA, m, B, midB, n, start );
                }
                end = end - subLenB;
                subLenA = subLenA/2;
                subLenB = subLenB/2;
                midA = midA + subLenA;
                midB = midB - subLenB;
            }
        }
        
        //if goes to here, subLenA == 1, subLenB == 1, then
        median = min( A[midA] , B[midB] );
        
        
        return median;
    }
    
    double findMedianElement( int A[], int midA, int m, int B[], int midB, int n, int start){
        
        while( start < (m+n)/2 && midA < m && midB < n){
            if( A[midA] >= B[midB] )
            {
                midB++;
                start++;
                if( start == (m+n)/2 )
                    return A[midA]<=B[midB]?A[midA]:B[midB];
            }
            else
            {
                midA++;
                start++;
                if( start == (m+n)/2 )
                    return A[midA]<=B[midB]?A[midA]:B[midB];
            }
        }
        
        if( midA == m )
        {
            while( start< (m+n)/2 )
            {
                midB++;
                start++;
            }
            return B[midB];
        }
        else if( midB == n )
        {
            while( start < (m+n)/2 )
            {
                midA++;
                start++;
            }
            return A[midA];
        }
        
    }
    
    double findMedianPair(int A[], int midA, int m, int B[], int midB, int n, int start)
    {
        while( start < (m+n)/2-1 && midA < m && midB < n){
            if( A[midA] >= B[midB] )
            {
                midB++;
                start++;
                if( start == (m+n)/2-1 )
                {
                    int pair1 = min(A[midA], B[midB]);
                    if( pair1 == A[midA] ) midA++;
                    else if( pair1 == B[midB]) midB++;
                    int pair2 = min(A[midA], B[midB]);
                    return (pair1+pair2)/2.0;
                }
            }
            else
            {
                midA++;
                start++;
                if( start == (m+n)/2-1 )
                {
                    int pair1 = min(A[midA], B[midB]);
                    if( pair1 == A[midA] ) midA++;
                    else if( pair1 == B[midB]) midB++;
                    int pair2 = min(A[midA], B[midB]);
                    return (pair1+pair2)/2.0;
                }
            }
        }
        
        if( midA == m )
        {
            while( start< (m+n)/2-1 )
            {
                midB++;
                start++;
            }
            return (B[midB]+B[midB+1])/2.0;
        }
        else if( midB == n )
        {
            while( start < (m+n)/2-1 )
            {
                midA++;
                start++;
            }
            return (A[midA]+A[midA+1])/2.0;
        }
    }
    double findMedianSortedArrays(int A[], int m, int B[], int n) {
        // Start typing your C/C++ solution below
        // DO NOT write int main() function
        if( m == 0 && n == 0 ) return 0;
        else if( m == 0 && n == 1 )
            return B[0];
        else if( m == 1 && n == 0 )
            return A[0];
        else if( m == 1 && n == 1 )
            return (A[0]+B[0])/2;
        //Two cases: m+n is even, then the median is (O[(m+n)/2-1] + O[(m+n)/2])/2,
        //assuming that the O[] stands for the merged array.
        //if m+n is odd, then median is O[(m+n)/2].
        
        if( (m+n)%2 == 0 ) //even
            return findMedianSortedArraysEven( A, m, B, n);
        else
            return findMedianSortedArraysOdd( A, m, B, n );
    }
    
};

do binary search until start = (m+n)/2.

算法复杂度: O(log(m+n)


这道题还是没有很清晰的思路,改日再来改正这个答案

原创粉丝点击