Collections你用对了吗？

来源：互联网发布：c语言模块化书籍编辑：程序博客网时间：2024/05/22 03:38

.Net有两类基础的集合类型：List和Dictionary。List是基于Index的，Dictionary是基于key的。集合类型一般实现了IEnumberable，ICollection或者Ilist接口。

类型

描述

使用场景

ArrayList

能够根据添加项动态调整集合大小。

适用于存储自定义类型，特别是对于数据经常进行增加和删除的。
使用TrimToSize()去掉预留元素位置，优化性能。
使用ArrayList.BinarySearch进行高效的查询。
不要用ArrayList去存储string。用StringCollection代替。
ArrayList.Sort()默认使用的是快速排序。

Hashtable

基于Key的hash值，存储key/value对的collection类型。

适用于存储大量并不经常更改的数据，数据经常改动会造成额外的性能开销（计算hash值）。
适用于经常Search而不需要排序的情况。

HybridDictionary

能够根据集合的大小，动态使用相应类型的字典类型。使用ListDictionary（小），使用Hashtable（大）。

适用于经常进行Search操作。
不要用于排序。

ListDictionary

基于Key的hash值，存储key/value对的collection类型，基于单链表实现的。

对于不大于10的key/value对的排序非常高效。

NameValueCollection

存储string类型的key/value的可排序的Collection。

string类型，可排序。可以在一个集合中包含多个相同Key的实体。
适用于经常改变数据。
适用于去缓存数据项以便于快速取回。

SortedList

基于Key排序的Key/value对的collection。可以通过key或index来提取value。

适用于存储静态数据，并且在一段时间内只有少量数据需要更新。
可以用于快速根据索引或者Key来取回数据。
避免用于存储数据有大量更新的数据。创建和排序的时间非常耗时。
避免存储string类型，因为过度的转换开销。使用StringCollection代替。

StringCollection

string类型的arraylist

适用于存储经常变化的string类型的数据，并且需要在大块中检索。
适合绑定string类型数据到DataGrid，避免了向下类型转换的开销。

StringDictionary

基于Hash table。用于存储key是string类型的Dictionary

适用于存储需要经常检索的静态string类型数据

Queue

基于ICollection的实现，先进先出

Stack

后进先出

集合类型常见问题：

Boxing issues
Thread safety
Enumeration overhead

Boxing issues

如果使用ArrayList去存储值类型时都会被执行装箱操作，当取数据时会执行拆箱操作。当要存储大量的item时会导致过度开销。

ArrayList al =new ArrayList();

for (int i = 0; i < 1000; i++)

al.Add(i); //Implicitly boxed because Add() takes an object

int f = (int)al[0];// The element is unboxed

解决方案：

考虑使用强类型的数组代替，或者使用自定义集合类型。

在.net framework 2.0之后，使用泛型可以避免装箱和拆箱的开销。

Thread Safety

通常，集合默认不是线程安全的。在多线程下对于读操作是线程安全的，对于任何修改操作是非线程安全的，会导致未知的结果。

解决方案：

使用Synchronized方法

//初始化

ArrayList arrayList = new ArrayList();

//添加对象到集合

…….

//使用Synchronized方法

Arraylist syncArrayList = ArrayList.Synchronized(arrayList);

使用Lock语句的SyncRoot属性

ArrayList myCollection = new ArrayList();

Lock(myCollection.SyncRoot){}

Enumeration Overhead

.Net Framework 1.1通过重写IEnumerable.GetEnumberator方法提供了枚举器。

但也不是很理想，理由如下：

GetEnumerator方法是虚拟的，所以无法内联。
GetEnumerator方法返回的是Ienumerator的接口而不是一个实际的类型。因此，在编译期间，无法确定实际的枚举器(Enumerator)
MoveNext方法和Current属性同样是虚拟的，因此无法内联。
IEnumerator.Current返回的的是Object类型。根据存储在集合的数据类型可能需要装箱和拆箱操作。

因此在使用foreach对集合进行操作时会面临托管堆和虚拟方法的开销。

Collection Guidelines

Analyze your requirements before choosing the collection type.（选择集合类型之前进行需求分析）
Initialize collections to the right size when you can.（当你能确定集合的大小时，初始化集合时指定合适的大小）
Consider enumerating overhead.（考虑枚举开销）
Prefer to implement IEnumerable with optimistic concurrency.（实现IEnumerable接口去实现开放式并发）
Consider boxing overhead.（考虑装箱和拆箱开销）
Consider for instead of foreach.（考虑使用for代替foreach）
Implement strongly typed collections to prevent casting overhead.（实现强类型的集合类型去防止强制转换开销）
Be efficient with data in collections.（高效利用集合中的数据）

Analyze your requirements before choosing the collection type.

Do you need to sort your collection?
- ArrayList适用于只读已排序的数据作为数据源
- SortedList适用于需要排序的静态不经常更新的数据，通常。（当构造集合时SortedList预先排序数据，创建排序列举的过程是比较昂贵的。但是任何对数据的更新都能自动且高效的对集合进行重新排序）
- NameValueCollection适用于需要排序的字符串集合。
Do you need to search your collection?
- 如果你会根据key/value随机的进行search，使用HashTable。
- 如果对字符串进行随机的search，使用StringDictionary。
- 对于集合大小小于10的进行search使用ListDictonary
Do you need to access each element by index?
- 对于基于0索引开始的集合，使用ArrayList和StringCollection
- 通过Key来取数据，使用HashTable,SortedList,ListDictionary,StringDictionary。
- NameValueCollection既可以使用索引又可以通过key取数据。
- 集合（Array）通过索引取数据比任何集合类型都高效。
Do you need a custom collection?
- 当你想通过引用类型封送集合。
- 你需要创建一个强类型集合来避免转换开销（如果创建的强类型集合是继承自CollectionBase或者Hashtable，你依旧无法避免转换开销。）。
- 你需要创建一个只读集合。
- 你需要对你的强类型集合提供序列化。
- 你需要注意枚举的开销。

Initialize collections to the right size when you can.

尽可能的指定集合的大小，可以更好的提高集合性能。

Consider enumerating overhead.

如果实现了Ienumerable.GetEnumerator,同时实现一个非虚拟的GetEnumerator方法。

class MyClass : IEnumerable

{

// non-virtual implementation for your custom collection

public MyEnumerator GetEnumerator() {

return new MyEnumerator(this); // Return nested public struct

}

// IEnumerator implementation

public IEnumerator.GetEnumerator() {

return GetEnumerator();//call the non-interface method

}

foreach调用非虚拟的GetEnumerator会比通过接口调用虚拟方法更高效。

实现IEnumerator.Current属性。

// Custom property in your class

//call this property to avoid the boxing or casting overhead

Public MyValueType Current {

MyValueType obj = new MyValueType();

// the obj fields are populated here

return obj;

}

// Explicit member implementation

Object IEnumerator.Current {

get { return Current} // Call the non-interface property to avoid casting

}

Prefer to implement IEnumerable with optimistic concurrency.

确保集合不会被修改当集合正在被枚举。
采用快照在枚举器中。

Consider boxing overhead.

当在集合中存储值类型时，装箱开销会根据集合的大小，更新或操作数据的频率影响。

如果不需要集合过多的功能，尽量使用Array.

Consider for instead of foreach.

在性能敏感的代码中尽量使用for.

Implement strongly typed collections to prevent casting overhead.

Be efficient with data in collections.

当处理大量的对象时，处理每个对象的大小会十分重要。

1 0