Python库Numpy的argpartition函数浅析

来源：互联网发布：centos挂载硬盘分区编辑：程序博客网时间：2024/06/05 03:04

最近在研究用Python实现K邻近算法的KD树，发现一个奇怪的现象，就是为什么没有在网上搜到关于numpy的argpartition的任何中文的文章。难道是我搜索方法有问题吗？所以自己写一篇简单的文章说说这个有意思的函数。
这个函数有什么用呢？我举几个例子，您可能就明白她的作用了。
例子一：找到数组的最大最小值，虽然这不是很合适，牛刀小用了。

>>> import numpy as np>>> arr = np.array([46, 57, 23, 39, 1, 10, 0, 120])>>> np.argpartition(arr, len(arr)-1)array([5, 3, 2, 6, 4, 0, 1, 7], dtype=int32)>>> arr[np.argpartition(arr, len(arr)-1)[len(arr)-1]]120>>> arr[np.argpartition(arr, 0)[0]]0

例子二：找到数组的第3小（index=2）的值和第2大（index=-2）的值。这次就很合适了。

>>> arr[np.argpartition(arr, 2)[2]]10>>> arr[np.argpartition(arr, -2)[-2]]57

例子三：同时找到第3和第4小的值。注意这里，用[2,3]同时将第3和第4小的排序好，然后可以分别通过下标[2]和[3]取得。

>>> arr[np.argpartition(arr, [2,3])[2]]10>>> arr[np.argpartition(arr, [2,3])[3]]23

有朋友您可能会问了，我直接把数组arr排序一次不就行了吗？还需要argpartition做什么？这是个好问题。其实我开始也有这样的疑问，然后我做了一个实验，立刻明白了这样做的好处。

如果您也感兴趣，不妨听我再多说几句。这个实验是这样的。

我生成了一个数组，为了达到更好的效果，这个数组被设计成倒序，这样sort会得到最坏的结果。

>>> arr = np.array([8,7,6,5,4,3,2,1])>>> np.argpartition(arr, 0)array([7, 1, 2, 3, 4, 5, 6, 0], dtype=int32)>>> np.argpartition(arr, 1)array([7, 6, 2, 3, 4, 5, 1, 0], dtype=int32)>>> np.argpartition(arr, 2)array([7, 6, 5, 3, 4, 2, 1, 0], dtype=int32)>>> np.argpartition(arr, 3)array([6, 7, 5, 4, 3, 1, 2, 0], dtype=int32)>>> np.argpartition(arr, 4)array([4, 7, 6, 5, 3, 1, 2, 0], dtype=int32)>>> np.argpartition(arr, 5)array([4, 7, 6, 5, 3, 2, 1, 0], dtype=int32)>>> np.argpartition(arr, 6)array([4, 7, 6, 5, 3, 2, 1, 0], dtype=int32)>>> np.argpartition(arr, 7)array([4, 7, 6, 5, 3, 2, 1, 0], dtype=int32)

第一次调用，给第二个参数传了0，说明我需要返回最小值得索引index。得到的返回值是array([7, 1, 2, 3, 4, 5, 6, 0], dtype=int32)，在这个返回的array中，我关心的是第0个值（7），它是原数组arr的索引，arr[7]就是我要找的最小值。请注意返回值中的其他几个索引值，和原数组的索引比起来，他们基本上没有什么变化。接下来的几次调用也是这种情况，其实这也就说明argpartition没有对他不关心的数据做太大的改动或者操作。

argpartition只排序第2个参数指定的位置，其他的位置的值不保证排序正确。因为这样只调用一次argpartition函数，也就是执行了一遍搜索就找到了n个值的排位，所以argpartition的效率比较高（别激动，我猜的）。据说是O(n)。

下面是函数的英文帮助，我在KD树的代码里只用到了前2个参数，而且只对一维数组排序，够了。使用不同的参数可以对复杂数据处理，我没有用到，也没有去实验，以后用到了我会补充的。
我写的KD树代码：http://blog.csdn.net/weixin_37722024/article/details/62226957

numpy.argpartition
numpy.argpartition(a, kth, axis=-1, kind=’introselect’, order=None)[source]
Perform an indirect partition along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as a that index data along the given axis in partitioned order.

New in version 1.8.0.

Parameters:
a : array_like
Array to sort.
kth : int or sequence of ints
Element index to partition by. The k-th element will be in its final sorted position and all smaller elements will be moved before it and all larger elements behind it. The order all elements in the partitions is undefined. If provided with a sequence of k-th it will partition all of them into their sorted position at once.
axis : int or None, optional
Axis along which to sort. The default is -1 (the last axis). If None, the flattened array is used.
kind : {‘introselect’}, optional
Selection algorithm. Default is ‘introselect’
order : str or list of str, optional
When a is an array with fields defined, this argument specifies which fields to compare first, second, etc. A single field can be specified as a string, and not all fields need be specified, but unspecified fields will still be used, in the order in which they come up in the dtype, to break ties.
Returns:
index_array : ndarray, int
Array of indices that partition a along the specified axis. In other words, a[index_array] yields a partitioned a.
See also
partition
Describes partition algorithms used.
ndarray.partition
Inplace partition.
argsort
Full indirect sort
Notes

See partition for notes on the different selection algorithms.

Examples

One dimensional array:

>

x = np.array([3, 4, 2, 1])
x[np.argpartition(x, 3)]
array([2, 1, 3, 4])
x[np.argpartition(x, (1, 3))]
array([1, 2, 3, 4])

x = [3, 4, 2, 1]
np.array(x)[np.argpartition(x, 3)]
array([2, 1, 3, 4])

0 0