Peaks Complexity

来源:互联网 发布:时序数据分类基本原理 编辑:程序博客网 时间:2024/05/21 07:00

Peaks Complexity

up vote3down votefavorite
1

I've just done the following Codility Peaks problem. The problem is as follows:


A non-empty zero-indexed array A consisting of N integers is given. A peak is an array element which is larger than its neighbors. More precisely, it is an index P such that 0 < P < N − 1, A[P − 1] < A[P] and A[P] > A[P + 1]. For example, the following array A:

A[0] = 1A[1] = 2A[2] = 3A[3] = 4A[4] = 3A[5] = 4A[6] = 1A[7] = 2A[8] = 3A[9] = 4A[10] = 6A[11] = 2

has exactly three peaks: 3, 5, 10. We want to divide this array into blocks containing the same number of elements. More precisely, we want to choose a number K that will yield the following blocks: A[0], A[1], ..., A[K − 1], A[K], A[K + 1], ..., A[2K − 1], ... A[N − K], A[N − K + 1], ..., A[N − 1]. What's more, every block should contain at least one peak. Notice that extreme elements of the blocks (for example A[K − 1] or A[K]) can also be peaks, but only if they have both neighbors (including one in an adjacent blocks). The goal is to find the maximum number of blocks into which the array A can be divided. Array A can be divided into blocks as follows:

one block (1, 2, 3, 4, 3, 4, 1, 2, 3, 4, 6, 2). This block contains three peaks.

two blocks (1, 2, 3, 4, 3, 4) and (1, 2, 3, 4, 6, 2). Every block has a peak.

three blocks (1, 2, 3, 4), (3, 4, 1, 2), (3, 4, 6, 2). Every block has a peak. 

Notice in particular that the first block (1, 2, 3, 4) has a peak at A[3], because A[2] < A[3] > A[4], even though A[4] is in the adjacent block. However, array A cannot be divided into four blocks, (1, 2, 3), (4, 3, 4), (1, 2, 3) and (4, 6, 2), because the (1, 2, 3) blocks do not contain a peak. Notice in particular that the (4, 3, 4) block contains two peaks: A[3] and A[5]. The maximum number of blocks that array A can be divided into is three.

Write a function: class Solution { public int solution(int[] A); } that, given a non-empty zero-indexed array A consisting of N integers, returns the maximum number of blocks into which A can be divided. If A cannot be divided into some number of blocks, the function should return 0. For example, given:

A[0] = 1A[1] = 2 A[2] = 3 A[3] = 4 A[4] = 3 A[5] = 4 A[6] = 1 A[7] = 2 A[8] = 3 A[9] = 4 A[10] = 6 A[11] = 2

the function should return 3, as explained above. Assume that:

N is an integer within the range [1..100,000]; each element of array A is an integer within the range [0..1,000,000,000].

Complexity:

expected worst-case time complexity is O(N*log(log(N)))

expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments).

Elements of input arrays can be modified.


My Question

So I solve this with what to me appears to be the brute force solution – go through every group size from 1..N, and check whether every group has at least one peak. The first 15 minutes I was trying to solve this I was trying to figure out some more optimal way, since the required complexity is O(N*log(log(N))).

This is my "brute-force" code that passes all the tests, including the large ones, for a score of 100/100:

public int solution(int[] A) {    int N = A.length;    ArrayList<Integer> peaks = new ArrayList<Integer>();    for(int i = 1; i < N-1; i++){        if(A[i] > A[i-1] && A[i] > A[i+1]) peaks.add(i);    }    for(int size = 1; size <= N; size++){        if(N % size != 0) continue;        int find = 0;        int groups = N/size;        boolean ok = true;        for(int peakIdx : peaks){            if(peakIdx/size > find){                ok = false;                break;            }            if(peakIdx/size == find) find++;        }        if(find != groups) ok = false;        if(ok) return groups;    }    return 0;}

My question is how do I deduce that this is in fact O(N*log(log(N))), as it's not at all obvious to me, and I was surprised I pass the test cases. I'm looking for even the simplest complexity proof sketch that would convince me of this runtime. I would assume that a log(log(N)) factor means some kind of reduction of a problem by a square root on each iteration, but I have no idea how this applies to my problem. Thanks a lot for any help

shareimprove this question
 

4 Answers

activeoldestvotes
up vote2down vote

You're completely right: to get the log log performance the problem needs to be reduced. 

A n.log(log(n)) solution in python [below]. Codility no longer test 'performance' on this problem (!) but the python solution scores 100% for accuracy. 

As you've already surmised: Outer loop will be O(n) since it is testing whether each size of block is a clean divisor Inner loop must be O(log(log(n))) to give O(n log(log(n))) overall.

We can get good inner loop performance because we only need to perform d(n), the number of divisors of n. We can store a prefix sum of peaks-so-far, which uses the O(n) space allowed by the problem specification. Checking whether a peak has occurred in each 'group' is then an O(1) lookup operation using the group start and end indices.

Following this logic, when the candidate block size is 3 the loop needs to perform n / 3 peak checks. The complexity becomes a sum: n/a + n/b + ... + n/n where the denominators (a, b, ...) are the factors of n. 

Short story: The complexity of n.d(n) operations is O(n.log(log(n))). 

Longer version: If you've been doing the Codility Lessons you'll remember from the Lesson 8: Prime and composite numbers that the sum of harmonic number operations will give O(log(n)) complexity. We've got a reduced set, because we're only looking at factor denominators. Lesson 9: Sieve of Eratosthenes shows how the sum of reciprocals of primes is O(log(log(n))) and claims that 'the proof is non-trivial'. In this case Wikipedia tells us that the sum of divisors sigma(n) has an upper bound (see Robin's inequality, about half way down the page). 

Does that completely answer your question? Suggestions on how to improve my python code are also very welcome!

def solution(data):    length = len(data)    # array ends can't be peaks, len < 3 must return 0        if len < 3:        return 0    peaks = [0] * length    # compute a list of 'peaks to the left' in O(n) time    for index in range(2, length):        peaks[index] = peaks[index - 1]        # check if there was a peak to the left, add it to the count        if data[index - 1] > data[index - 2] and data[index - 1] > data[index]:            peaks[index] += 1    # candidate is the block size we're going to test    for candidate in range(3, length + 1):        # skip if not a factor        if length % candidate != 0:            continue        # test at each point n / block        valid = True        index = candidate        while index != length:            # if no peak in this block, break            if peaks[index] == peaks[index - candidate]:                valid = False                break            index += candidate        # one additional check since peaks[length] is outside of array            if index == length and peaks[index - 1] == peaks[index - candidate]:            valid = False        if valid:            return length / candidate    return 0

Credits: Major kudos to @tmyklebu for his SO answer which helped me a lot.

题目大意是希望将序列等分成c片,每片都要至少有一个peak,peak就是比左右都大的数(原序列首尾不能算),求c的最大值 //Codility题目描述还真是啰嗦啊喂

  1. 用O(n)得空间统计从开始到目前为止得peak数sum[],以及最远两peak间坐标差D
  2. 求最大分片数c,即求最小分片长度k,可行解k的可能范围在(D/2,min(D,n/2)],要等分首先 n % k == 0,其次sum[k - 1], sum[2 * k - 1], sum[3 * k - 1],....sum[n - 1]这个数列有n/k项。所以外层循环k的次数等于(D/2,D]间n的约数个数(小于D/2),内层判断是否可行需要n/k的操作(小于2n/D)。于是第2步时间复杂度也是O(n)。
  3. 编程上要注意计算D的时候要考虑第一个和最后一个peak到首尾的距离,还有不要混淆K c的含义(大写字母做变量名很容易出错)。

代码

int solution(vector<int> &A) {    int N = A.size();    vector<int> npeaks(N+1, 0);//npeaks[i]代表第i个元素(不包括)之前peak的数量    int maxD = 0;//最远两peak间坐标差D    int last_peak = -1;//处理第一个peak到起始的距离     for(int i = 1; i < N-1; i++){        if(A[i]>A[i+1] && A[i]>A[i-1]){            npeaks[i+1] = npeaks[i] + 1;            maxD = max(maxD, i - last_peak);            last_peak = i;        }        else{            npeaks[i+1] = npeaks[i];        }    }    maxD = max(maxD, N - last_peak);//处理最后一个peak到末端的距离    npeaks[N] = npeaks[N-1];    if(npeaks[N] < 1) return 0;    if(maxD > N/2) return 1;     for(int K = maxD/2; K <= maxD; K++){//slice长度        if(N%K == 0){            bool isvalid = true;            int c = N/K;//slice数量            for(int i = 1; i <= c; i++){                if(npeaks[i*K] - npeaks[(i-1)*K] < 1){                    isvalid = false;                    break;                }            }            if(isvalid) return c;        }    }    cout<<"fail"<<endl;    for(int K = maxD+1;;K++) {        if(N%K == 0) return N/K;//不能return K 呀    }}

0 0
原创粉丝点击