CareerCup How to find medium of 1 billion numbers across N distributed machines efficiently?

来源:互联网 发布:mac如何修改磁盘名称 编辑:程序博客网 时间:2024/05/15 07:26

How to find medium of 1 billion numbers across N distributed machines efficiently?

----------------------------------------------------------------------------------


1)Each machine sorts it's own elements. 
Comlexity: nlog(n) 
Time: Highest of all the machines. 
2) Leader machine builds a heap of m elements(m being the number of machines) 
Heap node contains numbers and machine to which the number belongs 
3) Leader machine asks each machine to give next smallest element. 
Complexity: m log(m) 
4) Leader machine removes the smallest element from heap(o(1)) and asks for next min number to the machine to which that number belonged. 
5) Insert the next min number in heap, repeast from step 4 till the time kth min number is found. 
Total time complexity: 
if h is highest chunk of data with a machine, h log(h) for sorting. 
If m is number of machines: 
m log(m) for building heap. 
If k is half of billion numbers, find kth element complexity is: 
k log(m) 
Total messages passed: 
k(half billion). 

I am wondering if I could do the heap part in parallel.





0 0
原创粉丝点击