Week1 Assignment - Number Inversion - Stanford - Algorithm: Design and Analysis

来源：互联网发布：单片机驱动伺服电机编辑：程序博客网时间：2024/05/23 18:32

题注

这技术博客频繁被朋友们催稿，实在是亚历山大啊… 有的希望我继续写Princeton大学公开课《Algorithms - Part II》后面的Programming Assignment解答；有的希望我写Stanford公开课《Algorithm: Design and Analysis》中Programming Assignment解答；有的希望我展示一下一些最新提出的密码学算法的实现；还有我周围的朋友听说我改造了一个Extended 2048游戏，希望把改造过程共享给大家。

需要写的东西一多吧，有时候反而就不想写了… 专心做一件事情还可以，同时构思好几个博客的思路就比较困难了，唉，深深叹一口气啊。不过，今天晚上趁着自己不困，也不太累，也确实想清一清朋友们的催稿，不妨完成一部分未写完的博客，跟大家分享一下。

这次博客的批量更新不会涉及到Princeton公开课《Algorithm - Part II》的Programming Assignment了。首先是因为那个课的作业实在是太费时间，因为其不光是测试算法是否正确，同时还涉及到非常多实现细节的问题，保证算法运行的完备性。因此开发起来特别的费时间。我花了大概一下午的时间做完了Part II中Week 1 - Wordnet的编写。到了Week2，是个图像处理的问题，更是我不太熟悉的领域了… 因此编着编着自己就有点烦了，一拍桌子一瞪眼，以后再说啦！

这次博客批量更新主要写的是Stanford公开课《Algorithm: Design and Analysis》的Programming Assignment，以及有关最近特别火的游戏《2048》的完全分析和扩展开发。这篇博客看题目就知道是有关Stanford公开课《Algorithm: Design and Analysis》 Week1 Assignment的解答啦。这个课程实际上是有Accomplishment的，本来我不想公布答案，但是呢，考虑到当时我学算法的时候，因为国内论坛中习题的答案琳琅满目，但是很少有针对算法中实现的问题逐步剖析，逐步解决问题的文章出现。而这种文章恰恰是最真想学计算机科学，而非仅仅为了拿个学分的朋友们希望看到的博客。因此，本着这个想法，我还是把答案公开了，目的真的是希望朋友们能够认真看一看整个实现的流程，而非粘贴下我的源代码，跑出一个结果，拿到一个With Distinction的Accomplishment而已。说实话，上公开课更多的是学东西，大家还是不要为了拿Accomplishment而抄答案了…

题目

Download the text file here. (Right click and save link as)

This file contains all of the 100,000 integers between 1 and 100,000 (inclusive) in some order, with no integer repeated.

Your task is to compute the number of inversions in the file given, where the row of the file indicates the entry of an array.
Because of the large size of this array, you should implement the fast divide-and-conquer algorithm covered in the video lectures.The numeric answer for the given input file should be typed in the space below.
So if your answer is 1198233847, then just type 1198233847 in the space provided without any space / commas / any other punctuation marks. You can make up to 5 attempts, and we'll use the best one for grading.
(We do not require you to submit your code, so feel free to use any programming language you want --- just type the final numeric answer in the following space.)

[TIP: before submitting, first test the correctness of your program on some small test files or your own devising. Then post your best test cases to the discussion forums to help your fellow students!]

分析

首先要说，下载text file就废了我超长的时间。死活就是下载不了，我开始以为是因为国内禁网的问题，最后逼得我翻了墙才把那个IntegerArray.txt下载下来…最后发现，实际上只要把链接稍微改一下，就能够下载了…

下载IntegerArray.txt的地址官方给出的为http://spark-public.s3.amazonaws.com/algo1/programming_prob/IntegerArray.txt。这个地址下载不了，不过改成https协议就可以了，修改后的地址为https://spark-public.s3.amazonaws.com/algo1/programming_prob/IntegerArray.txt，就可以顺利的读取文件了。

为了让整个博客self-contained，我还是简要地说一下Number Inversion的算法，我采用课上老师的笔记用英文原版来写，大家尽量理解啦。描述完问题和算法后，说一说我实现过程中遇到的问题以及解决的方法。最后还是附上源代码。

这里面插一句，话说我还专门查了一下这个老师，发现是个Computer Theory的大牛… 为什么查这个老师呢？这是Stanford公开课《Cryptography I》留下的后遗症。我听《Cryptography I》的时候，实际上还没有MOOC的概念呢，那时候Coursera刚刚出现，Stanford自己贡献了4门课程，其中一门课程就是Dan Boneh的这门密码学课。刚开始听的时候觉得这老师还挺厉害，但是不知道有多厉害。结果自己读博，做Pairing-Based Cryptography，发现读的论文全都是Dan Boneh，或者是他的学生写的… 那篇Pairing-Based Cryptography奠基性的论文《Identity-Based Encryption from the Weil Pairing》就是Boneh和他的学生Franklin写的… 这门算法课的老师，有兴趣的话大家可以查一查他的DBLP，可以发现也是个业界大牛，领军人物，各种ACM Trans，顶级会议什么的。唉，这种人只能膜拜啊！

Number Inversion问题描述

回归正题，Number Inversion的问题描述是：

Input : Array A containing the number 1,2,3,......,n in some arbitrary order;

Output: Number of inversions = number of pairs (i, j) of array indices with i < j and A[i] < A[j]

Number Inversion Example

Input: {1, 3, 5, 2, 4, 6}

Inversions: (3, 2), (5, 2), (5, 4)

Output: 3

Divide and Conquer Solution

High-Level Algorithm

Count (array A, length n)if n = 1return 0elsex = count(1st half of A, n/2)y = count(2nd half of A, n/2)z = countSplitInv(A, n)return x + y + z

现在的问题是，在O(n)复杂度下面实现countSplitInv函数。如果实现了，由于Divide and Conquer本身有log n的复杂度，因此总复杂度就会为O(nlog n)。课上给出的方法是用Merge的思路来做。

Example

Consider merging 1,3,5 and 2,4,6:

When 1 get copied to output, no inversion;

When 2 get copied to output, discover the split inversions (3, 2), (5, 2);

When 3 get copied to output, no inversion;

When 4 get copied to output, discover the split inversion (5, 4)

When 5,6get copied to output, no inversion.

General Claim

The spli inversions involving an element y of the 2nd array C are precisely the numbers left in the 1st array B when y is copied to the output D.

Merge and CountSplitInv

1. While merging the two sorted subarrays, keep running total of number of split inversions.

2. When element of 2nd array C gets copied to output D, increment total by number of elements remaining in 1st array B.

实现

思路清楚以后就是实现了。实际上，我们只需要在一个已经实现好的Merge Sort算法上面，加上计算和返回inversion的功能就可以了。那么，哪里找已经实现好的Merge Sort呢？这需要感谢Princeton大学《Algorithm》课的贡献了。那门课提供了几乎所有基础算法的Java实现，当然也包括Merge Sort啦。下载地址为：http://algs4.cs.princeton.edu/22mergesort/Merge.java.html。

同时，因为那门课提供的一个stdlib.jar工具非常的好用，尤其是封装好的文件读取函数，比直接在Java上自己实现文件读取方便的多。因此，我很多函数都直接调用stdlib.jar的工具，大家运行我代码的时候，还请在自己的工程文件添加这个jar包。stdlib.jar的下载地址为：http://algs4.cs.princeton.edu/code/stdlib.jar。

拿到Merge Sort，后面的事情就简单啦，按照算法的描述，增加存储和返回inversion数量的功能就好了。这里有两点需要注意：

1. 在Merge Sort的算法中，Merge范围的定义是lo到mid，mid到hi的。因此，假如我们将第mid到hi的中第j个元素Merge了，那么count要加几呢？我刚开始的时候写成了mid - i，这是不对的。实际上，从i到mid中间一共有mid - i + 1个元素，因此count值应该加上mid - i + 1。否则结果就是错的。

2. 对于题目给定的输入，Inversion的数量超过了2^16，因此如果大家用int来存储count的话，最后返回的结果是一个负数… 这显然是溢出了嘛。因此，大家要用long来存储count。如果是为了通用性，甚至大家可以考虑用BigInteger存储，这样对于任意长度的数组输入，都能给出正确的Inversion数量返回结果。这里面提一下，Stanford还是很人道的，如果int溢出了，但是计算得到的结果还是个正数，那么不知道会有多少朋友要一遍一遍地查实现代码中的错误了… 事实上，数据溢出导致结果错误，这种Bug是非常难找到的，感谢Stanford！

源码

源代码我是直接在Merge Sort上面改的，而且由于Stanford课程只需要个答案，因此代码看起来比较凌乱，没有像Princeton公开课《Algorithm》写的那么严谨。不过大致的地方还是加了注释的。请大家原谅我这个菜鸟吧…凑合看凑合看~

public class MergeCountSplitInv {    // stably merge a[lo .. mid] with a[mid+1 .. hi] using aux[lo .. hi]    public static long merge(Comparable[] a, Comparable[] aux, int lo, int mid, int hi) {        // precondition: a[lo .. mid] and a[mid+1 .. hi] are sorted subarrays        assert isSorted(a, lo, mid);        assert isSorted(a, mid+1, hi);        // copy to aux[]        for (int k = lo; k <= hi; k++) {            aux[k] = a[k];         }        // merge back to a[]        int i = lo, j = mid+1;        long count = 0;        for (int k = lo; k <= hi; k++) {        //add remaining elements from right to a[]            if (i > mid){            a[k] = aux[j++];            }            //add remaining elements from left to a[]            else if (j > hi) {            a[k] = aux[i++];            }            //merge a right one into a[] counting            else if (less(aux[j], aux[i])) {            a[k] = aux[j++];            count += mid - i + 1;            }            //merge a left one into a[], not counting            else {            a[k] = aux[i++];            }        }        // postcondition: a[lo .. hi] is sorted        assert isSorted(a, lo, hi);        return count;    }    // mergesort a[lo..hi] using auxiliary array aux[lo..hi]    private static long sort(Comparable[] a, Comparable[] aux, int lo, int hi) {        if (hi <= lo) return 0;        int mid = lo + (hi - lo) / 2;        long left = sort(a, aux, lo, mid);        long right = sort(a, aux, mid + 1, hi);        long split = merge(a, aux, lo, mid, hi);        return left + right + split;    }    public static long count(Comparable[] a) {        Comparable[] aux = new Comparable[a.length];        long result = sort(a, aux, 0, a.length-1);        assert isSorted(a);        return result;    }   /***********************************************************************    *  Helper sorting functions    ***********************************************************************/        // is v < w ?    private static boolean less(Comparable v, Comparable w) {        return (v.compareTo(w) < 0);    }            // exchange a[i] and a[j]    private static void exch(Object[] a, int i, int j) {        Object swap = a[i];        a[i] = a[j];        a[j] = swap;    }   /***********************************************************************    *  Check if array is sorted - useful for debugging    ***********************************************************************/    private static boolean isSorted(Comparable[] a) {        return isSorted(a, 0, a.length - 1);    }    private static boolean isSorted(Comparable[] a, int lo, int hi) {        for (int i = lo + 1; i <= hi; i++)            if (less(a[i], a[i-1])) return false;        return true;    }    // print array to standard output    private static void show(Comparable[] a) {        for (int i = 0; i < a.length; i++) {            StdOut.println(a[i]);        }    }    // Read strings from standard input, sort them, and print.    public static void main(String[] args) {        Integer[] a = new Integer[100000];        In in = new In("IntegerArray.txt");        for (int i = 0; i < a.length; i++){        a[i] = in.readInt();        }        StdOut.println(a[0] + " " + a[99999]);        StdOut.println(MergeCountSplitInv.count(a));    }}

程序可以直接运行，就能得到正确的结果了。运行的时候，请注意IntegerArray.txt存放的位置，这个因不同的配置而异，在这里只是提醒一下朋友们，以防止出现“未找到文件”这类的错误。

最后正确的答案我就不放上面啦，大家还是自己运行一下，体会一下算法的运行速度吧~

分数截图

1 0