Keep track of the median?

来源:互联网 发布:淘宝书店代理 编辑:程序博客网 时间:2024/05/16 10:02

From: Career up 150.

1. Question: Numbers are randomly generated and stored into an (expanding) array. How would you keep track of the median?

Answer: 

Heap? A heap is really good at basic ordering and keeping track of max and mins. This is actually interesting – if you had two heaps, you could keep track of the biggest half and the smallest half of the elements. The biggest half is kept in a min heap, such that the smallest element in the biggest half is at the root. The smallest half is kept in a max heap, such that the biggest element of the smallest half is at the root. Now, with these data structures, you have the potential median elements at the roots. If the heaps are no longer the same size, you can quickly “rebalance” the heaps by popping an element off the one heap and pushing it onto the other.

2. 可参考 http://yaronspace.cn/blog/archives/1306 http://www.cppblog.com/820986942/archive/2011/05/23/146991.html

题目介绍:

输入为不断地数字流,实时显示出当前已经输入的数字序列的中位数

解答:

求中位数的方法很多,对于大数据量最经典是桶的计数方法,但是对于这个问题不适用,因为数据是不断变化的

可以用最大堆和最小堆来解答这个问题:

1.假设当前的中位数为m,其中最大堆维护的是<=m的数字序列,最小堆维护的是>=m的数字序列,但是两个堆都不包含m

2.当新的数字到达时,比如为a,将a与m进行比较,若a<=m 则将其加入到最大堆中,否则将其加入到最小堆中

3.如果此时最小堆和最大堆的元素个数的差值>=2 ,则将m加入到元素个数少的堆中,然后从元素个数多的堆将根节点赋值到m,最后重建两个最大堆和最小堆,返回到2


进一步,如果数组数据不仅仅是增加,而是可以删除数据。这时应该使用什么数据结构呢?上面的堆就不合适了(查找需要O(n))。例如下面的问题,

https://www.interviewstreet.com/challenges/dashboard/#problem/4fcf919f11817

The median of M numbers is defined as the middle number after sorting them in order, if M is odd or the average number of the middle 2 numbers (again after sorting) if M is even. You have an empty number list at first. Then you can add or remove some number from the list. For each add or remove operation, output the median of numbers in the list.
 
Example : For a set of m = 5 numbers, { 9, 2, 8, 4, 1 } the median is the third number in sorted set { 1, 2, 4, 8, 9 } which is 4. Similarly for set of m = 4, { 5, 2, 10, 4 }, the median is the average of second and the third element in the sorted set { 2, 4, 5, 10 } which is (4+5)/2 = 4.5  
 
Input:
 
The first line is an integer n indicates the number of operations. Each of the next n lines is either "a x" or "r x" which indicates the operation is add or remove.
 
Output:
 
For each operation: If the operation is add output the median after adding x in a single line. If the operation is remove and the number x is not in the list, output "Wrong!" in a single line. If the operation is remove and the number x is in the list, output the median after deleting x in a single line. (if the result is an integer DO NOT output decimal point. And if the result is a double number , DO NOT output trailing 0s.)
 
Constraints:
 
0 < n <= 100,000
 
for each "a x" or "r x" , x will fit in 32-bit integer.
 
Sample Input:
 
7
r 1
a 1
a 2
a 1
r 1
r 2
r 1
 
Sample Output:
Wrong!
1
1.5
1
1.5
1
Wrong!
 
Note: As evident from the last line of the input, if after remove operation the list becomes empty you have to print "Wrong!" ( quotes are for clarity ).