IDictionary Options - Performance Test - SortedList vs. SortedDictionary vs. Dictionary vs. Hashtabl
来源:互联网 发布:mac合上盖子锁屏 编辑:程序博客网 时间:2024/06/08 14:30
http://blog.bodurov.com/Performance-SortedList-SortedDictionary-Dictionary-Hashtable
This is a sequence of tests comparing the performance results for four different implementations of IDictionary and in particular generic Dictionary, generic SortedDictionary, the old non-generic Hashtable and generic SortedList.
I performed several tests, comparing the following parameters: memory used in bytes, time for the insertion in ticks, time for the item search in ticks, and the time for looping with foreach in ticks. The test was performed 8000 times for all the four implementation, and the order was random so each implementation has been tested at least 1000 times.
I performed the tests in five stages, to observe the relationship between the number of entries and the performance. In the first stage the collections had 50 items, in the second 500, in the third 5,000 in the fourth 50,000 items.
In this particular test, lower numbers of memory usage or time taken for the execution means better performance. So if we want to present visually a performance chart for each of the parameters we have to deduce a performance coefficient from the raw data. I have used the following code to calculate the performance coefficient:
This way, because the best performing value is the lowest one it will be transformed as value 1 of the performance coefficient and any other value will be a fraction of that value.
This is the chart of the memory usage:
The results stay consistent with the increase of the number of items in collections. Best memory footprint we see in the SortedList, followed by Hashtable, SortedDictionary and the Dictionary has highest memory usage. Despite all that, we have to note, that the differences are not significant and unless your solution requires extreme sensitivity about the memory usage you should consider the other two parameters: time taken for the insert operations and time taken for searching a key as more important. It is important to note that this test does not take into consideration the effects of garbage collection and thus can only be taken in a very general way.
This is the chart for the time taken for insert operations:
When the number of records is small the differences between all four implementations are not significant but with the increase of items in the collection the performance of the SortedList drops dramatically, SortedDictionary is better but still taking significantly more time for inserts than the other two implementations. Hashtable is the next in the list and the ultimate leader is the generic Dictionary.
This is the chart for the time taken for search operations:
The absolute leader is Hashtable, but the test does not consider the type of item being stored. That could be a possibility for a future test. The next best performer is the generic Dictionary followed by the other two implementations. The differences here between SortedList and SortedDictionary are not significant.
This is the chart for the time taken for for-each collection loop operations:
Here the leader is SortedList then Dictionary consistantly better than Hashtable and the words performer isSortedDictionary
Here you can see the task manager during the test it shows us a picture of the memory usage. Area A is during the insertion phase when more and more memory has been allocating. Area B is from the end of the insertion until the garbage collection of the object.
This is the code I used for this test. The variable NumberInsertedKeys was changed to 50, 500, 5000, 50000, or 10000 for the different stages.
The computer used for the test has the following characteristics:
These is the raw data. Memory Used is in bytes and the insert, search and looping time in ticks:
Share this post: digg Stumble Upon del.icio.us Technorati E-mail
feedback temporarily disabled
Thanks for these interesting measurements!
I was exactly looking for this kind of information.
People considering which collection to use, might also be interested in the MSDN page "SortedList and SortedDictionary Collection Types" (http://msdn2.microsoft.com/en-us/library/5z658b67.aspx), which tells us the upper bounds of some operations on these collections in big O notation.
Thanks for the above test!
I was also looking for this option.
I guess i will go with HashTable, as it looks the fastest of the lot
Thanks! I just started to think about what to use and then found your artikle. It's good!
You forgot to add measure units (bytes) for memory allocation chart.
Thanks Flavius, I've added that now.
I found your results useful. However, it may be important to point out that the hashtable does not support generics.
Thanks for an article with immediate practical use. It helped in finalizing on which data structure to use for my app.
As Gary has pointed out above, the generic Dictionary has the advantage of type safety so it would seem that we should also factor in the unboxing overhead for the Hashtable's value object.
In my test for retrieval time using the hashtable with unboxing vs. generic dictionary the hashtable still performed three times as fast as the generic dictionary.
--Time in ticks--
Time taken to fetch all elements of Dictionary<> was 10559
Time taken to fetch all elements of hashtable without unboxing was 1431
Time taken to fetch all elements of hashtable WITH unboxing was 2846
This article was awesome! Thanks for sharing your findings with us. It was very helpful to the code I am writing.
One thing that would have been nice to see would have been insertion time for the sorted collections in the case that the strings were already sorted, as that happens sometimes.
Thanks!
Hi,
I just did the same test but using ASP.NET 3.5 and IIS 7.0, the results seems to indicate otherwise: Dictionary is better than SortedDictionary!
Take a look at http://jefferytay.wordpress.com/2009/04/16/performance-of-generics-sorteddictionary-and-dictionary/ for specifics
Hi Jeffery,
First of all thank you for the excellent work you have done! I like the attitude to never trust any source in the Internet and always check your self. Undoubtedly that’s the way to go.
However there is one problem that I see with your test. You always process SortedDictionary before Dictionary. If you read the description of my tests I have tried those in different order and what is more important each was performed separately. When you start performing operations on the SortedDictionary you initiate actions of the virtual machine related to large memory allocation and then when you reach the Dictionary the hard work is already done. I would encourage trying the same experiment in different order. But it is true that my test was with .NET 2.0 so it may have changed in 3.5.
I think you're missing an important variable/dimention in your tests - number of items in the collection.
Yes you are right in the comming days I will make a revision of the test as it seems that as pointed by Jeffery tests done with bigger sample show Dictionary performing better than SortedDictionary
Hi Vladimir,
Actually based on your results, i actually got my team to convert all Dictionary objects to SortedDictionary when they are doing their development. Because i remembered that dictionary does not maintain any sort order, and SortedDictionary, which uses a red-black tree, improves searching speed.
However during when i was doing some coding, i realized that for some wierd reason, my new codes seem to be performing much slower, that's when i decided to take our code for a test run :)
Sorry man, my data is just not big enough and when I was doing it something influenced SortedDictionary to perform better than Dictionary and because of the small sample that wasn’t normalized. I'll have soon results with very large sample, but my preliminary data shows that you seem to be right and Dictionary performs better than SortedDictionary.
Hi Valdimir,
Thanks for your extremely detailed test which tallies with mine. Now it just makes me wonder. Since Dictionary is better than SortedDictionary, why is it still there? Any ideas?
Because if you try foreach iteration with SortedDictionary the keys will be sorted while Dictionary will not give you sorted keys.
Well if i really did a foreach, that is a possibility, but it does not make sense for SortedDictionary to perform a sort on the key lists each and everytime we requests for it, i do not believe Microsoft developers will write such lousy codes.
Interesting thing is that i cannot seem to be able to find the source for Dictionary using reflector, any idea where it is residing in?
Will try to perform a test next week to test the differences in doing foreach and direct indexing for Dictionary and SortedDictionary, there must be some benefits of SortedDictionary
SortedDictionary does not peform sort each time you request a key it simply stores the data as a binary tree, that's why it can iterate them in order.
If you had to show the keys in order how would you do it with Dictionary? If the keys are not just 1,2,3,4 but something like 2,24,255,999 and you don't know what are the gaps? With SortedDictionary you just use foreach. The code of Dictionary is in mscorlib System.Collections.Generic
Your pictures looks nice, and it looks like you tried your best, but your results are definitely not accurate. I did similar tests today, and went on the web to see what other people found. Just to help everybody I am going to add my results.
This is for adding 2000000 items to each list, timing how long it will take to load the data, search 3 values in the data (one at the end, one at the beginning and one that don't exist). Then seeing how long it take to unload the data. Here is the results. Not your pretty pictures, but at least values that make sense.
5/5/2009 2:22:39 PM - Array
5/5/2009 2:22:40 PM - End Adding data: 00:00:00.7499952
5/5/2009 2:22:40 PM - Search complete: 00:00:00.0312498
5/5/2009 2:22:40 PM - Data Cleared: 00:00:00.0312498
5/5/2009 2:22:40 PM - Total time: 00:00:00.8124948
5/5/2009 2:22:40 PM - ArrayList
5/5/2009 2:22:41 PM - End Adding data: 00:00:00.7343703
5/5/2009 2:22:41 PM - Search complete: 00:00:00.0312498
5/5/2009 2:22:41 PM - Data Cleared: 00:00:00.0312498
5/5/2009 2:22:41 PM - Total time: 00:00:00.7968699
5/5/2009 2:22:41 PM - List<>
5/5/2009 2:22:42 PM - End Adding data: 00:00:00.7343703
5/5/2009 2:22:42 PM - Search complete: 00:00:00.0624996
5/5/2009 2:22:42 PM - Data Cleared: 00:00:00.0312498
5/5/2009 2:22:42 PM - Total time: 00:00:00.8281197
5/5/2009 2:22:42 PM - HashTable
5/5/2009 2:22:44 PM - End Adding data: 00:00:02.2812354
5/5/2009 2:22:44 PM - Search complete: 00:00:00
5/5/2009 2:22:44 PM - Data Cleared: 00:00:00.0468747
5/5/2009 2:22:44 PM - Total time: 00:00:02.3281101
5/5/2009 2:22:44 PM - Dictionary<>
5/5/2009 2:22:46 PM - End Adding data: 00:00:01.4218659
5/5/2009 2:22:46 PM - Search complete: 00:00:00
5/5/2009 2:22:46 PM - Data Cleared: 00:00:00.0312498
5/5/2009 2:22:46 PM - Total time: 00:00:01.4687406
5/5/2009 2:22:46 PM - SortedList<>
5/5/2009 3:57:03 PM - End Adding data: 01:34:17.1991614
5/5/2009 3:57:03 PM - Search complete: 00:00:00.1562490
5/5/2009 3:57:05 PM - Data Cleared: 00:00:01.7343639
5/5/2009 3:57:05 PM - Total time: 01:34:19.1678988
5/5/2009 3:57:05 PM - SortedDictionary<>
5/5/2009 3:57:16 PM - End Adding data: 00:00:11.4061770
5/5/2009 3:57:16 PM - Search complete: 00:00:00
5/5/2009 3:57:16 PM - Data Cleared: 00:00:00.0156249
5/5/2009 3:57:16 PM - Total time: 00:00:11.4218019
Thanks for posting your results Nugpot, but I really don't understand what do you mean when you say "but your results are definitely not accurate", because your results are just like my results, in particular:
Insert Performance:
1. Dictionary - best
2. HashTable - next
3. SortedDictionary - next
4. SortedList - worst
Just like mine test. And you search performance is:
1. HashTable, Dictionary, SortedDictionary - you don't have precise measurement
2. SortedList
Again fits to my test. If you have different data the numbers will be different but the order will be the same.
It is also not clear what you mean by "unload data" as you didn't post reference to where your code is. Your insert and search results are just like mine I can't speak for the rest because I haven't seen it but if you compare things they must be similar not apples with screw drivers.
What was the hardware used for these tests?
FYI your average calculation is wrong. Averaging 2,3,4 would come out as ((((2 + 3) / 2) + 4) / 2) or 3.25, instead of 3.
Also you should time how long it takes to execute all the lookups at once and take the average off of that instead of timing each individual one.
I ran some tests designed purely to test performance of lookups, no calling of GC(as .NET should pick when for you in the real world), although i did semi-track inserts.
Test was inserting X keys, then doing x*2 lookups (x where the key existed x where it didn't)
I let each of those tests of X keys run 10-15 times, and I came up with
Thanks for sharing your results Sean. You are right about the avg formula. The way I was doing it would shift the result toward the latest run but at the end the result will still average out. Still I corrected my code to make it more clean and I also extended my test including foreach loop test to show why you are getting consistently better results for Dictionary compared to Hashtable. The thing is that you are not isolating the looping performance for the rest and because of that, as you can see in my latest run, Dictionary is better than Hashtable for looping you get a better result. All other tests are consistent with my previous results and the old one can be seen at :
http://blog.bodurov.com/images/IDictTest01_old.gif
http://blog.bodurov.com/images/IDictTest02_old.gif
http://blog.bodurov.com/images/IDictTest03_old.gif
Hi Valdimir,
appreciate if you can change the link to my post tohttp://jefferytay.wordpress.com/2009/04/16/performance-of-generics-sorteddictionary-and-dictionary/
I made a few slight changes to your code and came up with different results
With string keys dictionary performed searches roughly 10% faster with all combinations of keys I tried.
With int keys dictionary performed searches roughly 40% faster with all combinations of keys I tried.
1: I created another test method called TestGeneric which accepts IDictionary<String, String>. If you look in reflector, calling Dictionary as an IDictionary reference results in validation code being called at runtime to ensure the objects passed in are actually TKey(in this case string), as opposed to at compile time when its generic (below is a sample of what it is actually doing). This appears provided the biggest performance increase.
2: I changed the key to C_key{i} instead of {random_letter}_key{i} (to make #3 possible)
3: I changed your search from a single lookup to NumberInsertedKeys lookups, and just checking for each value that had inserted as the key (so not using a foreach loop to eliminate that possible conflict)
4: I commented out the randomness to determine which
Also when I commented out the garbage collect code to test memory usage Dictionary tests seemed to increase more (I personally prefer to test without garbage collection since in production my code will rely on .NET to select when to run it for me anyways). But that test does appear to be impacting the results of the other tests. (Disclaimer: I only ran it with string keys and 50000 entries)
I switched it back to using random letters and looking up every entry and got roughly the same results, I had to make more changes to do that though by caching the random entries and reusing them.
Thank you this was really very helpful. Many of the constructors for these collections support an initial capacity. While I understand that it is not always possible to predict this capacity, I do quite often make a range estimate. If I recall, the default initial capacity is quite low. This would result in frequent extensions of the List style collections. The dictionary style collections would expand more slowly. It might be interesting to examine some effects of initial capacity. Thanks again for your previous work.
Excellent blog. I was looking for these type of information. I was comparing the SortedDictionary and SortedList. Now I know SortedList is much better on performances.
Has anyone tried this type of thing with millions? I'm looking at putting roughly 30 million integer pairs into a hashtable for lookup purposes only. Any ideas which storage method would be best?
Another factor: string.GetHashCode() iterates through the entire string. So presumably, the longer the keys, the worse performance you'll find from Dictionary. Another point about SortedList: If the items are added in order, its insertion performance is better than SortedDictionary, because SortedDictionary rebalances the tree, and SortedList just adds the item on the end without reshuffling anything.
Sean,
Thanks for the change in the code. I tried it and it worked great for me. It was very helpful.
Joshua,
I have a project where I need to do this with millions also. Did you ever figure out what storage method worked best? Any tips would be great!
- IDictionary Options - Performance Test - SortedList vs. SortedDictionary vs. Dictionary vs. Hashtabl
- Dictionary, SortedDictionary, SortedList 比较
- Dictionary, SortedDictionary, SortedList 比较
- C# Dictionary, SortedDictionary, SortedList
- List vs IEnumerable vs IQueryable vs ICollection vs IDictionary
- List vs IEnumerable vs IQueryable vs ICollection vs IDictionary
- performance test - L3 Network VS Provider Network
- 最全数据结构详述: List VS IEnumerable VS IQueryable VS ICollection VS IDictionary
- Hashtable、Dictionary、SortedDictionary、SortedList的比较应用
- Performance Python Vs Java
- Performance Testing – Response vs. Latency vs. Throughput vs. Load vs. Scalability vs. Stress vs. Ro
- > VS >
- VS
- vs
- vs
- &&VS&
- VS
- vs
- AS3 与 服务器 64位 int 交互
- openldap使某些命令变慢
- Html中div和table自动换行(中文和字符)
- android中Activity实现再按一次退出
- 开发者对 Magento 的认识
- IDictionary Options - Performance Test - SortedList vs. SortedDictionary vs. Dictionary vs. Hashtabl
- php--连接Azure数据库
- Asp.net MVC Routing Debugger的使用
- SAP MDM SAP MDM Data Manager Check In/Out用法
- OGRE+VS2008环境搭建
- C语言函数指针
- android 开时启动一个程序
- Android --- libgdx android 学习初步(环境搭建及测试)
- linux C编码风格