基于堆排序实现的找出N个数据的前M大数据之Java实现

来源:互联网 发布:阿里云快速备案 编辑:程序博客网 时间:2024/04/29 21:51

算法思想:

当有N个数据,而N又是非常大(比如:千万或者亿),需要找出N条数据的排名最前的M条数据时,可以采用的一种策略。

先选前M个元素组成一个小根堆,然后遍历剩下的数据,如果第i个元素key大于小根堆的根结点,就删除这个根结点,并将元素key插入根结点,调整这个堆使其成为小根堆,然后继续遍历剩下的数据; 最后,小根堆中的元素就是最大的M个元素。


代码实现如下:

<span style="font-family:SimHei;"><span style="font-family:SimHei;font-size:14px;"><span style="font-size:14px;">HeapSort Class</span></span></span>

public abstract class HeapSort<E> {public abstract boolean compare(E value1, E value2);//value1小于value2则返回truepublic boolean heapSort(List<E> list){//排序return heapSort(list, list.size());}public boolean heapSort(List<E> list, int n){if(null == list || 0 == list.size()){return false;}if(!heapCreate(list, n)){return false;}for(int i = n; i > 0; --i){swap(list, 0, i - 1);heapAdjust(list, 0, i - 1);}return true;}public boolean heapCreate(List<E> list, int length){ //创建小根堆if(null == list || 0 == list.size()){return false;}for(int i = (length / 2 - 1); i >= 0; --i){if(!heapAdjust(list, i, length)){return false;}}return true;}public boolean heapAdjust(List<E> list, int middle, int length){//调整堆,使其满足小根堆的条件if(null == list || 0 == list.size()){return false;}E temp = list.get(middle);for(int i = (2 * middle + 1); i < length; i *= 2){if(i < (length - 1) && !this.compare(list.get(i), list.get(i + 1))){++i;}if(this.compare(temp,list.get(i))){break;}list.set(middle, list.get(i));middle = i;}list.set(middle, temp);return true;}public void swap(List<E> list, int i, int j){//数据交换E temp = list.get(i);list.set(i, list.get(j));list.set(j, temp);}}

FindFirstNData Class

public abstract class FindFirstNData<E> extends HeapSort<E>{public abstract boolean compare(E value1, E value2);public boolean findFirstNData(List<E> list, int n){if(!this.heapCreate(list, n)){return false;}for(int i = n; i < list.size(); ++i){if(!this.compare(list.get(0), list.get(i))){continue;}this.swap(list, 0, i);if(!this.heapAdjust(list, 0, n)){return false;}}return this.heapSort(list, n);}}

测试数据:给出10000000个Integer型的随机数据,找出前3大所用的时间如下图所示(单位:毫秒):


1 0