深入jdk——追踪Collections.sort 引发的bug(2)TimSort思路
来源:互联网 发布:并查集算法题 编辑:程序博客网 时间:2024/06/11 23:03
上片博客给大家留了疑问,就是错误发生在TimSort这一个方法内,那么,是怎么发生的呢,咱们先了解下TimSort的思路:
1. TimSort在Java 7中的实现
那么为什么Java7会将TimSort作为排序的默认实现,甚至在某种程度上牺牲它的兼容性(在stackoverflow上有大量的问题是关于这个新异常的)呢?接下来我们不妨来看一看它的实现。
首先建议大家先读一下这篇文章以简要理解TimSort的思想。
1.1) 如果传入的Comparator为空,则使用ComparableTimSort的sort实现。
if (c == null) { Arrays.sort(a, lo, hi); return; }
1.2) 传入的待排序数组若小于MIN_MERGE(Java实现中为32,Python实现中为64),则
a) 从数组开始处找到一组连接升序或严格降序(找到后翻转)的数
b) BinarySort:使用二分查找的方法将后续的数插入之前的已排序数组
if (nRemaining < MIN_MERGE) { int initRunLen = countRunAndMakeAscending(a, lo, hi, c); binarySort(a, lo, hi, lo + initRunLen, c); return; }
private static <T> void binarySort(T[] a, int lo, int hi, int start, Comparator<? super T> c) { assert lo <= start && start <= hi; if (start == lo) start++; for ( ; start < hi; start++) { T pivot = a[start]; // Set left (and right) to the index where a[start] (pivot) belongs int left = lo; int right = start; assert left <= right; /* * Invariants: * pivot >= all in [lo, left). * pivot < all in [right, start). */ while (left < right) { int mid = (left + right) >>> 1; if (c.compare(pivot, a[mid]) < 0) right = mid; else left = mid + 1; } assert left == right; /* * The invariants still hold: pivot >= all in [lo, left) and * pivot < all in [left, start), so pivot belongs at left. Note * that if there are elements equal to pivot, left points to the * first slot after them -- that's why this sort is stable. * Slide elements over to make room for pivot. */ int n = start - left; // The number of elements to move // Switch is just an optimization for arraycopy in default case switch (n) { case 2: a[left + 2] = a[left + 1]; case 1: a[left + 1] = a[left]; break; default: System.arraycopy(a, left, a, left + 1, n); } a[left] = pivot; } }
1.3) 开始真正的TimSort过程:
1.3.1) 选取minRun大小,之后待排序数组将被分成以minRun大小为区块的一块块子数组
a) 如果数组大小为2的N次幂,则返回16(MIN_MERGE / 2)
b) 其他情况下,逐位向右位移(即除以2),直到找到介于16和32间的一个数
int minRun = minRunLength(nRemaining);
private static int minRunLength(int n) { assert n >= 0; int r = 0; // Becomes 1 if any 1 bits are shifted off while (n >= MIN_MERGE) { r |= (n & 1); n >>= 1; } return n + r; }
1.3.2) 类似于4.2.a找到初始的一组升序数列
1.3.3) 若这组区块大小小于minRun,则将后续的数补足(采用binary sort插入这个数组)
1.3.4) 为后续merge各区块作准备:记录当前已排序的各区块的大小
1.3.5) 对当前的各区块进行merge,merge会满足以下原则(假设X,Y,Z为相邻的三个区块):
a) 只对相邻的区块merge
b) 若当前区块数仅为2,If X<=Y,将X和Y merge
b) 若当前区块数>=3,If X<=Y+Z,将X和Y merge,直到同时满足X>Y+Z和Y>Z
do { // Identify next run int runLen = countRunAndMakeAscending(a, lo, hi, c); // If run is short, extend to min(minRun, nRemaining) if (runLen < minRun) { int force = nRemaining <= minRun ? nRemaining : minRun; binarySort(a, lo, lo + force, lo + runLen, c); runLen = force; } // Push run onto pending-run stack, and maybe merge ts.pushRun(lo, runLen); ts.mergeCollapse(); // Advance to find next run lo += runLen; nRemaining -= runLen; } while (nRemaining != 0);
private static <T> int countRunAndMakeAscending(T[] a, int lo, int hi, Comparator<? super T> c) { assert lo < hi; int runHi = lo + 1; if (runHi == hi) return 1; // Find end of run, and reverse range if descending if (c.compare(a[runHi++], a[lo]) < 0) { // Descending while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) < 0) runHi++; reverseRange(a, lo, runHi); } else { // Ascending while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) >= 0) runHi++; } return runHi - lo; }
1.3.6) 重复4.3.2 ~ 4.3.5,直到将待排序数组排序完
1.3.7) Final Merge:如果此时还有区块未merge,则合并它们
assert lo == hi; ts.mergeForceCollapse(); assert ts.stackSize == 1;
private void mergeForceCollapse() { while (stackSize > 1) { int n = stackSize - 2; if (n > 0 && runLen[n - 1] < runLen[n + 1]) n--; mergeAt(n); } }
private void mergeAt(int i) { assert stackSize >= 2; assert i >= 0; assert i == stackSize - 2 || i == stackSize - 3; int base1 = runBase[i]; int len1 = runLen[i]; int base2 = runBase[i + 1]; int len2 = runLen[i + 1]; assert len1 > 0 && len2 > 0; assert base1 + len1 == base2; /* * Record the length of the combined runs; if i is the 3rd-last * run now, also slide over the last run (which isn't involved * in this merge). The current run (i+1) goes away in any case. */ runLen[i] = len1 + len2; if (i == stackSize - 3) { runBase[i + 1] = runBase[i + 2]; runLen[i + 1] = runLen[i + 2]; } stackSize--; /* * Find where the first element of run2 goes in run1. Prior elements * in run1 can be ignored (because they're already in place). */ int k = gallopRight(a[base2], a, base1, len1, 0, c); assert k >= 0; base1 += k; len1 -= k; if (len1 == 0) return; /* * Find where the last element of run1 goes in run2. Subsequent elements * in run2 can be ignored (because they're already in place). */ len2 = gallopLeft(a[base1 + len1 - 1], a, base2, len2, len2 - 1, c); assert len2 >= 0; if (len2 == 0) return; // Merge remaining runs, using tmp array with min(len1, len2) elements if (len1 <= len2) mergeLo(base1, len1, base2, len2); else mergeHi(base1, len1, base2, len2); }
2. Demo
这一节用一个具体的例子来演示整个算法的演进过程:
*注意*:为了演示方便,我将TimSort中的minRun直接设置为2,否则我不能用很小的数组演示。。。同时把MIN_MERGE也改成2(默认为32),这样避免直接进入binarysort。
初始数组为[7,5,1,2,6,8,10,12,4,3,9,11,13,15,16,14]
=> 寻找连续的降序或升序序列 (4.3.2)
[1,5,7][2,6,8,10,12,4,3,9,11,13,15,16,14]
=> 入栈 (4.3.4)
当前的栈区块为[3]
=> 进入merge循环 (4.3.5)
do notmerge因为栈大小仅为1
=> 寻找连续的降序或升序序列 (4.3.2)
[1,5,7] [2,6,8,10,12][4,3,9,11,13,15,16,14]
=> 入栈 (4.3.4)
当前的栈区块为[3, 5]
=> 进入merge循环 (4.3.5)
merge因为runLen[0]<=runLen[1]
1)gallopRight:寻找run1的第一个元素应当插入run0中哪个位置(”2”应当插入”1”之后),然后就可以忽略之前run0的元素(都比run1的第一个元素小)
2)gallopLeft:寻找run0的最后一个元素应当插入run1中哪个位置(”7”应当插入”8”之前),然后就可以忽略之后run1的元素(都比run0的最后一个元素大)
这样需要排序的元素就仅剩下[5,7] [2,6],然后进行mergeLow
完成之后的结果:
[1,2,5,6,7,8,10,12][4,3,9,11,13,15,16,14]
=> 入栈 (4.3.4)
当前的栈区块为[8]
退出当前merge循环因为栈中的区块仅为1
=> 寻找连续的降序或升序序列 (4.3.2)
[1,2,5,6,7,8,10,12] [3,4][9,11,13,15,16,14]
=> 入栈 (4.3.4)
当前的栈区块大小为[8,2]
=> 进入merge循环 (4.3.5)
do not merge因为runLen[0]>runLen[1]
=> 寻找连续的降序或升序序列 (4.3.2)
[1,2,5,6,7,8,10,12] [3,4][9,11,13,15,16] [14]
=> 入栈 (4.3.4)
当前的栈区块为[8,2,5]
=>
do not merege run1与run2因为不满足runLen[0]<=runLen[1]+runLen[2]
merge run2与run3因为runLen[1]<=runLen[2]
1) gallopRight:发现run1和run2就已经排好序
完成之后的结果:
[1,2,5,6,7,8,10,12][3,4,9,11,13,15,16] [14]
=> 入栈 (4.3.4)
当前入栈的区块大小为[8,7]
退出merge循环因为runLen[0]>runLen[1]
=> 寻找连续的降序或升序序列 (4.3.2)
最后只剩下[14]这个元素:[1,2,5,6,7,8,10,12][3,4,9,11,13,15,16] [14]
=> 入栈 (4.3.4)
当前入栈的区块大小为[8,7,1]
=> 进入merge循环 (4.3.5)
merge因为runLen[0]<=runLen[1]+runLen[2]
因为runLen[0]>runLen[2],所以将run1和run2先合并。(否则将run0和run1先合并)
1) gallopRight & 2) gallopLeft
这样需要排序的元素剩下[13,15] [14],然后进行mergeHigh
完成之后的结果:
[1,2,5,6,7,8,10,12][3,4,9,11,13,14,15,16] 当前入栈的区块为[8,8]
=>
继续merge因为runLen[0]<=runLen[1]
1) gallopRight & 2) gallopLeft
需要排序的元素剩下[5,6,7,8,10,12] [3,4,9,11],然后进行mergeHigh
完成之后的结果:
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]当前入栈的区块大小为[16]
=>
不需要final merge因为当前栈大小为1
=>
结束
3,总结
通过分析我们发现,我们自己写的比较方法违背了原则
这就违背了a)原则:假设X的value为1,Y的value也为1;那么compare(X, Y) ≠ –compare(Y,X)
PS: TimSort不仅内置在各种JDK 7的版本,也存在于Android SDK中(尽管其并没有使用JDK 7)。
- 深入jdk——追踪Collections.sort 引发的bug(2)TimSort思路
- 深入jdk——追踪Collections.sort 引发的bug(3)TimSort源码解读
- 深入jdk——追踪Collections.sort 引发的bug(1)mergeSort
- JDK源码分析——TimSort
- 深入了解Arras.sort()与Collections.sort()的区别
- JDK不同版本的Collections.Sort方法实现
- JDK中 java.util.Collections类的sort方法
- 深入分析集合List的排序Collections.sort
- Java记录 -67- 深入剖析Collections的sort方法
- java jdk TimSort
- 深入理解Spark 2.1 Core (十二):TimSort 的原理与源码分析
- 深入理解Spark 2.1 Core (十二):TimSort 的原理与源码分析
- java基础—— Collections.sort的两种用法
- java的Collections.sort()方法使用
- Collections.sort()的分析
- Collections的sort用法
- [Collections.sort方法]——逻辑梳理
- java基础(12)-- 深入理解Collections.sort()
- 更改eclipse的Package Explorer的字体
- java中静态代码块的用法 static用法详解
- 二叉树 --- 树的构造和遍历
- 分割字符串的类StringTokenizer的使用
- Android之ContentProvider总结
- 深入jdk——追踪Collections.sort 引发的bug(2)TimSort思路
- 二叉树总结
- 上传本地项目到github
- 消息模式Toast.makeText的几种常见用法
- Android四大组件之ContentProvider
- BlockingQueue
- servlet使用声明式异常处理指定错误跳转页面,ie下无法正常显示
- unit11 practice
- CursorAdapter中getView newView bindView异同