Determining Big O Notation

来源:互联网 发布:社交网络 传播效应 编辑:程序博客网 时间:2024/05/20 13:19

TheCodeCube - IT Community

Tutorials => Programming => Deployment Tutorials => Topic started by: KYA on September 12, 2009, 04:31:40 PM



Title: Featured: Determining Big O Notation
Post by: KYA on September 12, 2009, 04:31:40 PM
This is a subject many people are afraid of, or simply don't get. At first, it seems to be mystical hocus-pocus, but today I'll show you a simple way to quickly get an estimate of the Big O of an algorithm/function. This is  not the only way to determine Big O, nor do I make any claims of it being 100 percent accurate or effective. The idea here is to be able to look at loops, functions, and code in general, in a different light.

First, a definition:

Quote
    In mathematics, computer science, and related fields, big O notation describes the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. Big O notation allows its users to simplify functions in order to concentrate on their growth rates: different functions with the same growth rate may be represented using the same O notation.


The wiki article (http://en.wikipedia.org/wiki/Big_O_notation) from which this is taken is an excellent reference if you need to quickly know the Big O of common/popular algorithms. What it absolutely fails to do is explain how one determines the actual Big O [I will use this term interchangeably with upper bound or limit]. For those not mathematically inclined, their eyes probably glaze over when reading the symbol infested portions. At running the risk of repeating myself, this article is simply here to show another way, one I think is easier (i.e. you could have a solid grasp on the concept without ever taking Calculus).

Some Basic Rules:


   1. Nested loops are multiplied together.
   2. Sequential loops are added.
   3. Only the largest term is kept, all others are dropped.
   4. Constants are dropped.
   5. Conditional checks are constant (i.e. 1).

That's it really. I used the word loop, but the concept applies to conditional checks, full algorithms, etc.. since a whole is the sum of its parts. I can see the worried look on your face, this would all be frivolous without some examples[see code comments]:

Code: (cpp) [Select]
//linear
for(int i = 0; i < n; i++) {
        cout << i << endl;
}
 

Here we iterate 'n' times. Since nothing else is going on inside the loop (other then constant time printing), this algorithm is said to be O(n). The common bubble-sort:

Code: (cpp) [Select]
//quadratic
for(int i = 0; i < n; i++) {
        for(int j = 0; j < n; j++){
                //do swap stuff, constant time
        }
}
 

Each loop is 'n'. Since the inner loop is nested, it is n*n, thus it is O(n^2). Hardly efficient. We can make it a bit better by doing the following:

Code: (cpp) [Select]
//quadratic
for(int i = 0; i < n; i++) {
        for(int j = 0; j < i; j++){
                //do swap stuff, constant time
        }
}

Outer loop is still 'n'. The inner loop now executes 'i' times, the end being (n-1). We now have (n(n-1)/2). This is still in the bound of O(n^2), but only in the worst case.

An example of constant dropping:
Code: (cpp) [Select]
//linear
for(int i = 0; i < 2*n; i++) {
        cout << i << endl;
}
 

At first you might say that the upper bound is O(2n); however, we drop constants so it becomes O(n). Mathematically, they are the same since (either way) it will require 'n' elements iterated (even though we'd iterate 2n times).

An example of sequential loops:

Code: (cpp) [Select]
//linear
for(int i = 0; i < n; i++) {
        cout << i << endl;
}

//quadratic
for(int i = 0; i < n; i++) {
        for(int j = 0; j < i; j++){
                //do constant time stuff
        }
}
 

You wouldn't do this exact example in implementation, but doing something similar certainly is in the realm of possibilities. In this case we add each loop's Big O, in this case n+n^2. O(n^2+n) is not an acceptable answer since we must drop the lowest term. The upper bound is O(n^2). Why? Because it has the largest growth rate (upper bound or limit for the Calculus inclined).

Finite loops are common as well, an example:

Code: (cpp) [Select]
for(int i = 0; i < n; i++) {
        for(int j = 0; j < 2; j++){
                //do stuff
        }
}

Outer loop is 'n', inner loop is 2, this we have 2n, dropped constant gives up O(n).

In short Big O is simply a way to measure the efficiency of an algorithm. The goal is constant or linear time, thus the various data structures and their implementations. Keep in mind that a "faster" structure or algorithm is not necessary better. For example, see the classic hash table versus binary tree debate. While not 100% factual, it often said that a a hash-table is O(1) and is therefore better then a tree. From a discussion on the subject in a recent class I took:

 
Quote
   Assuming that a hash-table is, in fact, O(1), that's not quite true. Being O(1) makes the hash-table superior to a tree for insertion and retrieval of objects. However, hash-tables have no sense of order based on value, so they fall short of trees for searching purposes (including things like "get maximum value").

    That said, hash-tables aren't purely O(1). Poor choices in hash algorithm or table size, and issues like primary clustering, make operations on hash-tables in worse-than-constant time in reality.

    The point is, saying "hash-tables are superior to trees" without some qualifications is ridiculous. But then, it doesn't take a genius to know that sweeping generalizations are often problematic.


The above is always something good to keep in mind when dealing with theoretical computer science concepts. Hopefully you found this both interesting and helpful. Happy coding!


Title: Re: Determining Big O Notation
Post by: KYA on September 13, 2009, 12:00:01 AM
I noticed I didn't provide a log(n) or nlog(n) example, arguably the toughest ones. 

A quick metric to see if a loop is log n is to see how the counter increments in relationship to the total number of elements.

Example:


Code: (cpp) [Select]
for(int i = 0; i < n; i *= 2) {
        cout << i << endl;
}
 

There are n iterations, however, instead of simply incrementing, 'i' is increased by 2*itself each run. Thus the loop is log(n).

An example of nested loops:

Code: (cpp) [Select]
for(int i = 0; i < n; i++) { //linear
        for(int j = 0; j < n; j *= 2){ // log (n)
                //do constant time stuff
        }
}
 

This example is n*log(n). (Remember that nested loops multiply their Big O's.)