Effective C#之Item 40:Match Your Collection to Your Needs

来源:互联网 发布:java ui 编辑:程序博客网 时间:2024/05/04 00:03
 

Item 40: Match Your Collection to Your Needs

根据需求选择集合

To the question of "Whichcollection is best?," the answer is "It depends." Differentcollections have different performance characteristics and are optimized fordifferent actions. The .NET Framework supports many of the familiarcollections: lists, arrays, queue, stack, and others. C# supportsmultidimensional arrays, which have performance characteristics that differfrom either single-dimensional arrays or jagged arrays. The framework alsoincludes many specialized collections; look through those before you build yourown. You can find all the collections quickly because all collections implementthe ICollection interface. The documentation for ICollection lists all classesthat implement that interface. Those 20-odd classes are the collections at yourdisposal.

对于问题“哪个集合最好呢”,答案是“看情况”。不同集合有不同的性能特征,对不同的行为做了优化。.NET框架支持很多相似的集合:列表,数组,队列,栈和其它。C#支持多维数组,与一维数组和交错数组相比有不同的性能特征。框架同时也包含了很多特定的集合:在构建自己的集合之前,浏览一下这些。你可以很快速的找到这些集合,因为它们都实现了ICollection接口。ICollection文档列举了所有实现该接口的类。这20左右个类规你支配。

To pick the right collection foryour proposed use, you need to consider the actions you'll most often performon that collection. To produce a resilient program, you'll rely in theinterfaces that are implemented by the collection classes so that you cansubstitute a different collection when you find that your assumptions about theusage of the collection were incorrect (see Item19).

为了挑选你想使用的合适的集合,你需要考虑你最可能在那个集合上执行的操作。为了创建有弹性的程序,需要依赖于接口,该接口由集合类实现,那样的话,当你发现对该集合的使用不正确时,你可以将其替换成不同的集合。

The .NET Framework has threedifferent kinds of collections: arrays, arraylike collections, and hash-basedcontainers. Arrays are the simplest and generally the fastest, so let's startthere. This is the collection type you'll use most often.

.NET框架有3个不同种类的集合:数组,类似数组的集合和基于hash的容器。数组是最简单而且一般是最快的,因此让我们从它开始。这是一个你最经常使用的集合。

Your first choice should often bethe System.Array classor, more correctly, a type-specific array class. Thefirst and most significant reason for choosing the array class is that arraysare type-safe. All other collection classes store System.Object references,until C# 2.0 introduces generics (see Item49). When you declare any array, the compiler creates a specificSystem.Array derivative for the type you specify. For example, this declarationcreates an array of integers:

你最开始的选择应该是System.Array类,更准确的说,是一个指定了类型的数组类。选择数组类的最首要最明显的原因是数组是类型安全的。所有其它集合存储的是System.Object引用,直到C#2.0引入了泛型(Item 49)。当你声明任何数组时,编译器就创建一个特定的System.Array,派生出你指定的类型。例如,下面这个声明创建一个整型数组:

private int [] _numbers = newint[100];

The array stores integers, notSystem.Object. That's significant because you avoid the boxing and unboxingpenalty when you add, access, or remove value types from the array (see Item17). That initialization creates a single-dimensional array with 100integers stored in it. All the memory occupied by the array has a 0 bit patternstored in it. Arrays of value types are all 0s, and arrays of reference typesare all null. Each item in the array can be accessed by its index:

该数组存储整型,而不是System.Object。这很重要,因为当你对数组添加、访问、移除一个值的时候,可以避免装箱和拆箱的代价(见Item17)。刚才的初始化创建了一个一维数组,存储了100个整型。所有被该数组占据的内存都有一个0位模式,值类型的数组都是0,引用类型的数组都是null。数组中的每一项都可以通过索引来访问:

  1. int  j = _numbers[ 50 ];

In addition to the array access, youcan iterate the array using foreach or an enumerator (see Item11):

除了对数组的访问,你可以使用foreach或者枚举器(Item11)对该数组进行迭代:

  1. foreach ( int i in _numbers )
  2.   Console.WriteLine( i.ToString( ) );
  3. // or:
  4. IEnumerator it = _numbers.GetEnumerator( );
  5. while( it.MoveNext( ))
  6. {
  7.   int i = (int) it.Current;
  8.   Console.WriteLine( i.ToString( ) );
  9. }

If you are storing a singlesequence of like objects, you should store them in an array. But often, yourdata structures are more complicated collections. It's tempting to quickly fallback on the C-style jagged array, an array that contains other arrays.Sometimes, this is exactly what you need. Each element in the outer collectionis an array along the inner direction:

如果你要存储单一次序的对象,那么应该使用数组。但是很多情况下,你的数据结构是更复杂的集合。这很快让我们落回C风格的交错数据,即数组里面包含其他数组。有时,这正是你所需要的。每个外层集合的元素都是内层结构的一个数组:

  1. public class MyClass
  2. {
  3.   // Declare a jagged array:
  4.   private int[] [] _jagged;
  5.  
  6.   public MyClass()
  7.   {
  8.     // Create the outer array:
  9.     _jagged = new int[5][];
  10.  
  11.     // Create each inner array:
  12.     _jagged[0] = new int[5];
  13.     _jagged[1] = new int[10];
  14.     _jagged[2] = new int[12];
  15.     _jagged[3] = new int[7];
  16.     _jagged[4] = new int[23];
  17.   }
  18. }

Each inner single-dimension arraycan be a different size than the outer arrays. Use jagged arrays when you needto create differently sized arrays of arrays. The drawback to jagged arrays isthat a column-wise traversal is inefficient. Examining the third column in eachrow of a jagged array requires two lookups for each access. There is norelationship between the locations of the element at row 0, column 3 and theelement at row 1, column 3. Only multidimensional arrays can performcolumn-wise traversals more efficiently. Old-time C and C++ programmers madetheir own two- (or more) dimensional arrays by mapping them onto asingle-dimension array. For old-time C and C++ programmers, this notation isclear:

每个内部的一维数组都可以是任意大小,和外部数组没有关系。当你需要创建包含不同大小数组的数组时,使用交错数组。交错数组的缺点就是列方向的遍历是低效的。检查一个交错数组每一行的第三列,要求对每次访问要检查2次。在第0行第3列的元素和第1行第3列的元素没有任何关系。只有多维数组才能在列上进行高效的遍历。原来的CC++程序员将它们的二维或者多维数组映射到一个一维数组上,对他们来说,这个标识是清晰的:

  1. double num = MyArray[ i * rowLength + j ];

The rest of the world would preferthis:

世界上其他人可能更喜欢这样:

  1. double num = MyArray[ i, j ];

But C and C++ did not supportmultidimensional arrays. C# does. Use the multidimensional array syntax: It'sclearer to both you and the compiler when you mean to create a true multidimensionalstructure. You create multidimensional arrays using an extension of thefamiliar single-dimension array notation:

但是CC++不支持多维数组。C#支持。使用多维数组语法:当你想创建一个真正的多维数据结构的时候,对你和编译器来说都是清晰的。通过对熟悉的一位数组标识符的扩展来创建多维数组:

  1. private int[ , ] _multi = new int[ 10, 10 ];

The previous declaration creates atwo-dimensional array, a 10x10 array with 100 elements. The length of eachdimension in a multidimensional array is always constant. The compiler utilizesthis property to generate more efficient initialization code. Initializing thejagged array requires multiple initialization statements. In my simple exampleearlier, you need five statements. Larger arrays or arrays with more dimensionsrequire more extensive initialization code. You must write code by hand. However,multidimensional arrays with more dimensions merely require more dimensionspecifiers in the initialization statement. Furthermore, the multidimensionalarray initializes all array elements more efficiently. Arrays of value typesare initialized to contain a value at each valid index in the array. Thecontents of the value are all 0. Arrays of reference types contain null at eachindex in the array. Arrays of arrays contain null inner arrays.

前面的声明创建了一个2维数组,一个含有100个元素的1010的数组。在多维数组里面的每个维度上,长度总是恒定的。编译器利用该属性来生成更高效的初始化代码。对交错数组进行初始化要求多个初始化语句。在我这个简单的例子里面,你需要5条语句。更大的数组或者更多维的数组需要更多的初始化代码。你必须手工编写代码。然而,具有多个维度的多维数组,在初始化的时候,很少要求更具体的初始化,只需要指定有多少维就行了。另外,多维数组初始化时更高效。值类型的数组被初始化为:在每个有效的索引上包含一个值。这些值的内容都是0。引用类型的数组在每个有效索引上包含一个null,数组的数组在内部数组上都包含null

Traversing multidimensional arraysis almost always faster than traversing jagged arrays, especially for by-columnor diagonal traversals. The compiler can use pointer arithmetic on anydimension of the array. Jagged arrays require finding each correct value foreach single-dimension array.

在多维数组上进行遍历比在交错数组上进行遍历要快很多,尤其是列或者对角线的遍历。编译器可以在每个维度上使用指针算法。交错数组要求找到每个单独维度数组上的每个正确的值。

Multidimensional arrays can beused like any collection in many ways. Suppose you are building a game based ona checker board. You'd make a board of 64 Squares laid out in a grid:

多维数组在很多方面都可以像其他集合一样使用。假设你正在构建一个基于棋盘的游戏。在一个网格上分布64个方格。

  1. private Square[ , ] _theBoard = new Square[ 8, 8 ];

This initialization creates thearray storage for the Squares. Assuming that Square is a reference type, theSquares themselves are not yet created, and each array element is null. Toinitialize the elements, you must look at each dimension in the array:

该初始化创建了存储方格的数组。假设方格是引用类型,方格本身还没有被创建,每个元素都是null。为了初始化这些元素,你必须查看该数组的每个维度:

  1. for ( int i = 0; i < _theBoard.GetLength( 0 ); i++ )
  2.   forint j = 0; j < _theBoard.GetLength( 1 ); j++ )
  3.     _theBoard[ i, j ] = new Square( );

But you have more flexibility intraversing the elements in a multidimensional array. You can get an individualelement using the array indexes:

但是在一个多维数组里面进行遍历的时候,你有更大的灵活性。可以通过使用数组索引来获得每个单独的元素:

  1. Square sq = _theBoard[ 4, 4 ];

If you need to iterate the entirecollection, you can use an iterator:

如果你需要迭代整个集合,你可以使用迭代器:

  1. foreach( Square sq in _theBoard )
  2.   sq.PaintSquare( );

Contrast that with what you wouldwrite for jagged arrays:

和交错数组做个对比:

  1. foreach( Square[] row in _theBoard )
  2.   foreach( Square sq in row )
  3.     sq.PaintSquare( );
    Every new dimension in a jaggedarray introduces another foreach statement. However, with a multidimensionalarray, a single foreach statement generates all the code necessary to check thebounds of each dimension and get each element of the array. The foreachstatement generates specific code to iterate the array by using each arraydimension. The foreach loop generates the same code as if you had written this:

在交错数组的每个新的维度上,引入另一个foreach语句。然而,对于一个多维数组,一个单独的foreach语句就生成了所有必须的代码来检查每个维度的边界,获得数组的每个元素。Foreach语句生成特定的代码,通过使用每个数组维度,来迭代数组。Foreach循环生成的代码和你像下面这样写生成的代码是一样的:

  1. for ( int i = _theBoard.GetLowerBound( 0 );  i <= _theBoard.GetUpperBound( 0 ); i++ )
  2.   forint j = _theBoard.GetLowerBound( 1 ); j <= _theBoard.GetUpperBound( 1 ); j++ )
  3.     _theBoard[ i, j ].PaintSquare( );

This looks inefficient,considering all those calls to GetLowerBound and GetUpperBound inside the loopstatement, but it's actually the most efficient construct. The JIT compilerknows enough about the array class to cache the boundaries and to recognizethat internal bounds checking can be omitted (see Item11).

如果考虑在循环内部,所有对GetLowerBoundGetUpperBound的调用,那么这看起来是低效的。但是,实际上它是最高效的结构。JIT编译器对数组类了解的足够多,能够捕捉它的边界,并且意识到,内部边界检查可以被省略(Item11)

Two major disadvantages to thearray class will make you examine the other collection classes in the .NETFramework. The first affects resizing the arrays: Arrays cannot be dynamicallyresized. If you need to modify the size of any dimension of an array, you mustcreate a new array and copy all the existing elements to it. Resizing takestime: A new array must be allocated, and all the elements must be copied fromthe existing array into the new array. Although this copying and moving is notas expensive on the managed heap as it was in the C and C++ days, it stillcosts time. More important, it can result in stale data being used. Considerthis code fragment:

数组类的2个主要的劣势会让你检查.Net框架里面的其他集合类。第一个影响数组大小的改变:数据不能被动态的调整大小。如果你需要修改数组任何一维的大小,都需要创建新的数组,再拷贝所有存在的元素。改变大小需要时间:一个新的数组必须被分配,所有的元素都需要从现存的数组拷贝到新的数组里面。虽然这个拷贝和移动是在托管堆上进行的,不像在CC++一样,但是仍然花费时间。更重要的是,它会导致陈旧的数据。考虑这段代码:

  1. private string [] _cities = new string[ 100 ];
  2.  
  3. public void SetDataSources( )
  4. {
  5.   myListBox.DataSource = _cities;
  6. }
  7.  
  8. public void AddCity( string CityName )
  9. {
  10.   String[] tmp = new string[ _cities.Length + 1 ];
  11.   _cities.CopyTo( tmp, 0 );
  12.   tmp[ _cities.Length ] = CityName;
  13.  
  14.   _cities = tmp; // swap the storage.
  15. }

Even after AddCity is called, thelist box uses the old copy of the _cities array for its data source. Your new city never shows up inthe list box.

甚至在AddCity被调用后,myListBox仍然在使用_cities的旧拷贝来存储数据。你的新city从不会在myListBox里面出现。

The ArrayList class is ahigher-level abstraction built on an array. The ArrayList collection mixes thesemantics of a single-dimension array with the semantics of a linked list. Youcan perform inserts on an ArrayList, and you can resize an ArrayList. TheArrayList delegates almost all of its responsibilities to the contained array,which means that the ArrayList class has very similar performancecharacteristics to the Array class. The main advantage of ArrayList over Arrayis that ArrayList is easier to use when you don't know exactly how large yourcollection will be. ArrayList can grow and shrink over time. You still pay theperformance penalty of moving and copying items, but the code for thosealgorithms has already been written and tested. Because the internal storagefor the array is encapsulated in the ArrayList object, the problem of staledata does not exist: Clients point to the ArrayList object instead of theinternal array. The ArrayList collection is the .NET Framework's version of theC++ Standard Library vector class.

ArrayList类是构建在数组基础上的更高层次的抽象。ArrayList集合混合了单维数组和链表的语义。你可以在ArrayList上面执行插入,可以改变ArrayList的大小。ArrayList将几乎所有的职责都委托给内部的数组,这意味着ArrayListArray类有非常相似的性能特性。ArrayListArray的最主要的优势是,当你不能精确的知道集合的大小时,ArrayList更容易使用。ArrayList可以随着时间增大或者缩小。你仍然需要为移动和复制元素付出性能代价,但是这些算法的代码已经写好并通过测试了。因为该数组的内部存储被封装在ArrayList对象里面,陈旧数据的问题就不再存在了:客户指向ArrayList对象,而不是内部数组。ArrayListC++标准库里面vector类在.Net框架下的版本。

The Queue and the Stack classesprovide specialized interfaces on top of the System.Array class. The specificinterfaces for those classes build custom interfaces for the first-in,first-out queue and the last-in, first-out stack. Always remember that thesecollections are built using a single-dimension array as their internal storage.The same performance penalty applies when you resize them.

队列和栈这两个类在System.Array上提供特定的接口。这些类的特定接口为先进先出的队列和后进先出的栈构建了自定义的接口。永远记住,这些集合在构建的时候,是使用一维数组作为内部存储结构的。当你改变它们大小的时候,同样要付出性能的代价。

The .NET collections don't containa linked list structure. The efficiency of the garbage collector minimizes thetimes when a list structure is really the best choice. If you really needlinked list behavior, you have two options. If you are using a list because youexpect to add and remove items often, you can use the dictionary classes withnull values. Simply store the keys. You can use the ListDictionary class, whichimplements a single linked list of key/value pairs. Or, you can use theHybridDictionary class, which uses the ListDictionary for small collections andswitches to a Hashtable for larger collections. These collections and a host ofothers are in the System.Collections.Specialized namespace. However, if youwant to use a list structure because of a user-controllable order, you can usethe ArrayList collection. The ArrayList can perform inserts at any location,even though it uses an array as its internal storage.

.Net集合不包含链表结构。垃圾回收器的效率减少了列表结构作为最好的选择的机会。如果你确实需要链表的行为,有2个选择。如果因为希望经常添加和移除元素而要使用列表,那么你可以使用带有null值的dictionary类。如果是简单的存储键值,那么可以使用ListDictionary类,它实现了单独的关于键/值对的链表。或者,你可以使用HybridDictionary类,它使用ListDictionary类来存储小集合;存储大集合时,它将转换成Hashtable。这些集合以及对其他集合的宿主都在System.Collections.Specialized命名空间下面。然而,如果你因为用户可控的顺序而想要使用列表结构的话,那么你可以使用ArrayListArrayList可以在任何位置执行插入操作,虽然它使用数组作为内部存储。

Two other classes supportdictionary-based collections: SortedList and Hashtable. Both contain key/valuepairs. SortedList orders the keys, whereas Hashtable does not. Hashtableperforms searches for a given key faster, but SortedList provides an orderediteration of the elements by key. Hashtable finds keys using the hash value ofthe key object. It searches by a constant time operation, O(1), if the hash keyis very efficient. The sorted list uses a binary search algorithm to find thekeys. This is a logarithmic operation: O(ln n).

还有2个支持基于字典的集合:SortedListHashtable。两者都包含键/值对。SortedList对键进行排序,而HashTable不这样做。Hashtable在对一个给定的键进行搜索时,执行的更快;SortedList对所有的元素提供根据键进行有序的迭代。Hashtable使用键对象的hash值对键进行查找。如果键是非常高效的话,它花费的查找时间是个常量,O(1)SortedList使用二叉查找算法来寻找键,它的算法花费是:O(ln n)

Finally, there is the BitArrayclass. As its name suggests, this holds bit values. The storage for theBitArray is an array of ints. Each int stores 32 binary values. This makes theBitArray class compact, but it can also decrease performance. Each get or setoperation in the BitArray performs bit manipulations on the int value thatstores the sought value and 31 other bits. BitArray contains methods that applyBoolean operations to many values at once: OR, XOR, AND, and NOT. These methodstake a BitArray as a parameter and can be used to quickly mask multiple bits inthe BitArray. The BitArray is the optimized container for bit operations; useit when you are storing a collection of bitflags that are often manipulatedusing masks. Do not use it as a substitute for a general-purpose array ofBoolean values.

最后,还有BitArray类。正如它的名字所暗示的,它用来存储位值。BitArray的存储结构是一个整型数组。每个整型存储32个二进制值。这使得BitArray很紧缩,但是它同时也降低了效率。BitArray的每个get或者set操作,都在存储数值的整型上面以及其它31位上面执行位操作。BitArray包含了一次对多个值进行布尔操作的方法:OR, XOR, AND NOT。这些方法将BitArray作为一个参数,可以快速的在BitArray上掩饰多个位。BitArray是对位操作的优化过的容器,当你存储关于位标志的集合时,并且有很多掩码操作时,就使用BitArray。对于一般目的的布尔数值数组,不要使用它。

With the exception of the Arrayclass, none of the collection classes in the 1.x release of C# is stronglytyped. They all store references to Object. C# generics will contain newversions of all these topologies that are built on generics. That will be thebest way to create type-safe collections. In the meantime, the currentSystem.Collections namespace contains abstract base classes that you can use tobuild your own type-safe interfaces on top of the type-unsafe collections:CollectionBase and ReadOnlyCollectionBase provide base classes for a list orvector structure. DictionaryBase provides a base class for key/value pairs. TheDictionaryBase class is built using a Hashtable implementation; its performancecharacteristics are consistent with the Hashtable.

除了Array类之外,C#1.0版本里面的其他集合类都不是强类型的。它们都存储Object的引用。C#泛型将包含这些拓扑结构的,构建在泛型上的新版本。这将是创建类型安全的集合的最好的方式。同时,当前的System.Collections命名空间包含了抽象基类,你可以在非类型安全的集合基础上,构建自己的类型安全的接口:CollectionBaseReadOnlyCollectionBase为列表或者向量结构提供了基类。DictionaryBase为键/值对提供了基类。DictionaryBase使用了HashTable来实现,它的性能特征和HashTable是一样的。

Anytime your classes containcollections, you'll likely want to expose that collection to the users of yourclass. You do this in two ways: with indexers and the IEnumerable interface.Remember that, early in this item, I showed you that you can directly accessitems in an array using [] notation, and you can iterate the items in the arrayusing foreach.

任何时候,你的类包含集合时,你都可能希望将该集合暴露给你的类的用户。有2个方法:使用索引或者迭代器接口。记住,在该条款的前面部分,我向你展示了:可以使用[]标志直接访问的数据元素,可以使用foreach迭代数组里面的元素。

You can create multidimensionalindexers for your classes. These are analogous to the overloaded operator []that you could write in C++. As with arrays in C#, you can createmultidimensional indexers:

可以为你的类创建多维索引。在C++里面,这是对[]操作符的重载。在C#里面的数组上,你相应的可以创建多维索引:

  1. public int this [ int x, int y ]
  2. {
  3.   get
  4.   {
  5.     return ComputeValue( x, y );
  6.   }
  7. }

Adding indexer support usuallymeans that your type contains a collection. That means you should support theIEnumerable interface. IEnumerable provides a standard mechanism for iteratingall the elements in your collection:

添加索引支持意味着你的类型包含了一个集合。也就意味着你需要支持IEnumerable接口。IEnumerable为在你的集合里面迭代所有的元素提供了一种机制:

  1. public interface IEnumerable
  2. {
  3.   IEnumerator GetEnumerator( );
  4. }

The GetEnumerator method returnsan object that implements the IEnumerator interface. The IEnumerator interfacesupports traversing a collection:

GetEnumerator方法返回实现了IEnumerator接口的对象。IEnumerator支持对集合的遍历:

  1. public interface IEnumerator
  2. {
  3.   object Current
  4.   { get; }
  5.  
  6.   bool MoveNext( );
  7.  
  8.   void Reset( );
  9. }

In addition to the IEnumerableinterface, you should consider the IList or ICollection interfaces if your typemodels an array. If your type models a dictionary, you should considerimplementing the IDictionary interface. You could create the implementationsfor these large interfaces yourself, and I could spend several more pagesexplaining how. But there is an easier solution: Derive your class fromCollectionBase or DictionaryBase when you create your own specializedcollections.

对于IEnumerable接口,如果你的类型建立在数组模型上,你还应该考虑IList ICollection接口。如果你的类型模型是字典,你应该考虑实现IDictionary接口。你可以对这些大接口创建自己的实现,我将会花更多的页面来解释如何做。但是有个简单的解决方法:如果你创建自己特定的集合的话,从CollectionBase或者DictionaryBase派生。

Let's review what we've covered.The best collection depends on the operations you perform and the overall goalsof space and speed for your application. In most situations, the Array providesthe most efficient container. The addition of multidimensional arrays in C#means that it is easier to model multidimensional structures clearly withoutsacrificing performance. When your program needs more flexibility in adding andremoving items, use one of the more robust collections. Finally, implementindexers and the IEnumerable interface whenever you create a class that modelsa collection.

让我们复习下讲了什么。最好的集合取决于你要执行的操作,以及应用程序对空间和速度的全局考虑。在多数情况下,Array会提供最高效的容器。C#里面的多维数组意味着:不需要牺牲性能,就能清晰的建造多维结构的模型。当你的程序需要更大的灵活性来添加、移除元素时,使用集合里面更健壮的一个。最后,当你创建一个以集合为模型的类时,实现索引和IEnumerable接口。

原创粉丝点击