External Merge Sort, time complexity analysis

来源:互联网 发布:靠谱助手无法连接网络 编辑:程序博客网 时间:2024/05/16 04:39

The time complexity of Merge Sort is nlogn.  How about the external merge sort?


1. One pass external merge sort

step 1. break N data to k groups, each group has N/k data, complexity is N/klog(N/k) * k = N log(N/k)

step 2. k way merge sort.  Use min_heap, each time to push the data in the heap takes log(k) time, N data takes Nlog(k), 

so total time complexity is Nlog(k) + Nlog(N/k) = NlogN


2. Two pass external merge sort

If k is too large, in merge sort, the content read in the input buffer may be too small.  For example, if the data needs to be sorted is 500G, one only have 1G memory.  Then k= 500.  When doing the merge sort, only 2MB will be read in the input buffer(Looks OK, Ah? But if total data is TB, or the memory is much smaller, then each time it may only read KB data in the main memory).  The frequency to access disk may be too high.  So we could do two path.

First path, break N to k2 group, do merge of each group, each group has N/k2 data.  (k2 = 20, each group has 25G data, and we further break each 25G data to 25 pieces of files)

Second path, do k2 way merge sort, takes Nlogk2 time.  So total time for two pass merge sort is Nlog(N/k2) + Nlogk2 = NlogN


Reference:

http://en.wikipedia.org/wiki/External_sorting

http://www.mitbbs.com/article_t/JobHunting/32257009.html