Programming Perl--Column1

来源:互联网 发布:同步翻译软件 编辑:程序博客网 时间:2024/06/05 06:46

problem:

以计算机的角度来分析problem

input:一个至多包含n=10,000,000个正整数的file,所有的integer都必须小于n,integer不允许重复出现

output:升序排序的integer list

约束条件:内存最大为1M,磁盘空间可认为无限大,运行时间不可到分钟级,要在seconds范围


solution:

解决方案很简单,就是使用一个bitmap 或者说是bit vector来表示integer,如果数字i出现在file中,则对bitmap中的第i个bit为1。这样子就标识了所有出现的数字。

这里有一个关键的约束条件:所有数字不会重复出现


伪码:

/* phase 1: initialize set to empty */

for i = [0,n)

bit[i] = 0

/* phase 2: insert present elements into the set */

for each i in the input file

bit[i] = 1

/* phase  3: write sorted output */

for i = [0,n)

if bit[i] == 1

write i on the output file


课后题目

1、如果没有memory限制时,代码如何写

#include <iostream>#include <set>int main (int argc, char *argv[]) {    std::set<int> integerSet;    int i;    std::set<int>::iterator iter;    while (std::cin >> i) {        integerSet.insert(i);    }    for (iter = integerSet.begin(); iter != integerSet.end(); ++iter) {        std::cout << *iter << " ";    }    std::cout << std::endl;    return 0;}

为何选择set,而不是list数据结构呢?这还得看一下哪个结构适合本题目,或者说代价更小,这就涉及到了list和set的实现本质问题

Set

Sets are a kind of associative containers that stores unique elements, and in which the elements themselves are thekeys.

Associative containers are containers especially designed to be efficient accessing its elements by their key (unlike sequence containers, which are more efficient accessing elements by their relative or absolute position).

Internally, the elements in a set are always sorted from lower to higher following a specific strict weak ordering criterion set on container construction.

Sets are typically implemented as binary search trees.

Therefore, the main characteristics of set as an associative container are:

  • Unique element values: no two elements in the set can compare equal to each other. For a similar associative container allowing for multiple equivalent elements, seemultiset.
  • The element value is the key itself. For a similar associative container where elements are accessed using a key, but map to a value different than this key, seemap.
  • Elements follow a strict weak ordering at all times. Unordered associative arrays, likeunordered_set, are available in implementations following TR1.

This container class supports bidirectional iterators.

为了节省时间就直接copy C++Library Reference了,可以看到set不允许有重复的key,并且是有序集合,采用二分查找树来search。

再来看一下list:

List

Lists are a kind of sequence containers. As such, their elements are ordered following a linear sequence.

List containers are implemented as doubly-linked lists; Doubly linked lists can store each of the elements they contain in different and unrelated storage locations. The ordering is kept by the association to each element of a link to the element preceding it and a link to the element following it.

This provides the following advantages to list containers:

  • Efficient insertion and removal of elements anywhere in the container (constant time).
  • Efficient moving elements and block of elements within the container or even between different containers (constant time).
  • Iterating over the elements in forward or reverse order (linear time).

Compared to other base standard sequence containers (vectors anddeques), lists perform generally better in inserting, extracting and moving elements in any position within the container, and therefore also in algorithms that make intensive use of these, like sorting algorithms.

The main drawback of lists compared to these other sequence containers is that they lack direct access to the elements by their position; For example, to access the sixth element in alist one has to iterate from a known position (like the beginning or the end) to that position, which takes linear time in the distance between these. They also consume some extra memory to keep the linking information associated to each element (which may be an important factor for large lists of small-sized elements).

Storage is handled automatically by the class, allowing lists to be expanded and contracted as needed.

list采用双向链表的方式实现的,这对于频繁进行插入删除操作比较有利,但对于本问题而言就有些声东击西了,但是同时也可以看到list也是有序的

Vector

Vectors are a kind of sequence containers. As such, their elements are ordered following a strict linear sequence.

Vector containers are implemented as dynamic arrays; Just as regular arrays, vector containers have their elements stored in contiguous storage locations, which means that their elements can be accessed not only using iterators but also using offsets on regular pointers to elements.

But unlike regular arrays, storage in vectors is handled automatically, allowing it to be expanded and contracted as needed.

Vectors are good at:

  • Accessing individual elements by their position index (constant time).
  • Iterating over the elements in any order (linear time).
  • Add and remove elements from its end (constant amortized time).

Compared to arrays, they provide almost the same performance for these tasks, plus they have the ability to be easily resized. Although, they usually consume more memory than arrays when their capacity is handled automatically (this is in order to accomodate for extra storage space for future growth).

vector的优点或者说突出点在于动态内存空间分配。

2、使用位操作符实现bitset操作

首先考虑到要使用int数组来完成上述bitset的构建,另外考虑到不同的计算机可能int的位数不同,考虑到移植性问题决定用int32_t

第二创建一个int32_t的数组,需要多大的数组,应该用n/32, 注意计算机是取上限的,所以需要 +1

第三set操作,首先需要定位到数组index,下标从0开始,则直接i/32即可,然后设置i位则需要原来的数组data取或操作,和谁|呢?需要与i%32 进行按位或操作

然后就构建bitset的代码,并进行测试:

#include <iostream>#define MAX_LENGTH 10000000#define INT_LENGTH 32#define SHIFT 5#define MASK 0X1Fint32_t integerArray[1 + MAX_LENGTH >> SHIFT];void set(int32_t i){    integerArray[i >> SHIFT] |= (1 << (i & MASK));}void clear(int32_t i) {    integerArray[i >> SHIFT] &= ~(1 << (i & MASK));}int test(int32_t i) {    return integerArray[i >> SHIFT] & (1 << (i & MASK));}int main (int argc, char *argv[]) {    for (int32_t i = 0; i < 10000000; i++) {        clear(i);    }    int32_t i;    while (std::cin >> i) {        set(i);        if (test(i)) {            std::cout << i << " is set" << std::endl;        }    }    return 0;}

今天就写到这里好了,等有机会再继续。。。


原创粉丝点击