Chapter12_Advanced_Numpy
来源:互联网 发布:sony平板 软件 编辑:程序博客网 时间:2024/06/07 05:01
import numpy as npimport pandas as pdimport matplotlib.pyplot as plt
//anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment. warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
%matplotlib inline
Setting Array Values by Broadcasting
The same broadcsting rule governing arithmetic operations also applies to setting values via array indexing.
from pandas import DataFrame,Series
arr = np.zeros((4,3))
arr[:] = 5
arr
array([[ 5., 5., 5.], [ 5., 5., 5.], [ 5., 5., 5.], [ 5., 5., 5.]])
However, if we had a one-dimentional array of values we wanted to set into the columns of the array,we can do that as long as the shape is compatible:
Advanced ufunc Usage
While many NumPy users will only make use of the fast element-wise operations pro- vided by the universal functions, there are a number of additional features that occa- sionally can help you write more concise code without loops.
ufunc Instance Methods
Each of NumPy’s binary ufuncs has special methods for performing certain kinds of special vectorized operations.
reduce
takes a single array and aggregates its values, optionally along an axis, by performing a sequence of binary operations.
For example, an alternate way to sum elements in an array is to use np.add.reduce:
arr = np.arange(10)
np.add.reduce(arr)
45
np.sum(arr)
45
a less trivial example
use np.logical_and to check whether the values in each row of an array are sorted:
arr = np.random.randn(5, 5)
arr[::2].sort(1)
np.sort
arr
array([[ -1.58645456e+00, -1.36502511e+00, -5.41627950e-04, 4.86430078e-01, 1.17172633e+00], [ -7.44597469e-01, -4.24979151e-01, -2.24296763e+00, 3.15071902e-01, 9.84530055e-01], [ -1.54861818e+00, -3.99856553e-01, -1.67025551e-01, 8.91978379e-02, 1.31957741e+00], [ 4.00996840e-01, -2.38865213e-03, 1.70717270e-01, -9.66816316e-01, 3.94653542e-01], [ -1.46761823e+00, -8.56722218e-01, -5.79742413e-01, 2.53720291e-01, 2.97676083e+00]])
arr[:,:-1]
array([[ -1.58645456e+00, -1.36502511e+00, -5.41627950e-04, 4.86430078e-01], [ -7.44597469e-01, -4.24979151e-01, -2.24296763e+00, 3.15071902e-01], [ -1.54861818e+00, -3.99856553e-01, -1.67025551e-01, 8.91978379e-02], [ 4.00996840e-01, -2.38865213e-03, 1.70717270e-01, -9.66816316e-01], [ -1.46761823e+00, -8.56722218e-01, -5.79742413e-01, 2.53720291e-01]])
arr[:,1:]
array([[ -1.36502511e+00, -5.41627950e-04, 4.86430078e-01, 1.17172633e+00], [ -4.24979151e-01, -2.24296763e+00, 3.15071902e-01, 9.84530055e-01], [ -3.99856553e-01, -1.67025551e-01, 8.91978379e-02, 1.31957741e+00], [ -2.38865213e-03, 1.70717270e-01, -9.66816316e-01, 3.94653542e-01], [ -8.56722218e-01, -5.79742413e-01, 2.53720291e-01, 2.97676083e+00]])
arr[:,:-1] < arr[:,1:]
array([[ True, True, True, True], [ True, False, True, True], [ True, True, True, True], [False, True, False, True], [ True, True, True, True]], dtype=bool)
np.logical_and.reduce(arr[:,:-1] < arr[:,1:], axis=1)
array([ True, False, True, False, True], dtype=bool)
arr[::2]
array([[ -1.58645456e+00, -1.36502511e+00, -5.41627950e-04, 4.86430078e-01, 1.17172633e+00], [ -1.54861818e+00, -3.99856553e-01, -1.67025551e-01, 8.91978379e-02, 1.31957741e+00], [ -1.46761823e+00, -8.56722218e-01, -5.79742413e-01, 2.53720291e-01, 2.97676083e+00]])
accumulate
accumulate is related to reduce like cumsum is related to sum. It produces an array of the
same size with the intermediate “accumulated” values:
arr = np.arange(15).reshape((3,5))
np.add.accumulate(arr, axis=1)
array([[ 0, 1, 3, 6, 10], [ 5, 11, 18, 26, 35], [10, 21, 33, 46, 60]])
outer
outer performs a pairwise cross-product between two arrays:
arr = np.arange(3).repeat([1,2,3])
arr
array([0, 1, 1, 2, 2, 2])
np.multiply.outer(arr, np.arange(5))
array([[0, 0, 0, 0, 0], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 2, 4, 6, 8], [0, 2, 4, 6, 8], [0, 2, 4, 6, 8]])
np.multiply.outer
outer(A, B)
Apply the ufunc op
to all pairs (a, b) with a in A
and b in B
.
result = np.subtract.outer(np.random.randn(3,4),np.random.randn(5))
result
array([[[-1.54216517, -0.52986349, -1.34379847, -2.32636464, -2.53081338], [-0.32168391, 0.69061777, -0.12331721, -1.10588338, -1.31033212], [ 1.00529622, 2.0175979 , 1.20366292, 0.22109675, 0.01664801], [ 1.02270583, 2.0350075 , 1.22107253, 0.23850636, 0.03405762]], [[ 2.18387989, 3.19618157, 2.38224659, 1.39968042, 1.19523168], [-0.36959033, 0.64271135, -0.17122363, -1.1537898 , -1.35823854], [ 0.49572124, 1.50802291, 0.69408793, -0.28847823, -0.49292697], [-3.38464023, -2.37233855, -3.18627353, -4.1688397 , -4.37328844]], [[-0.28755193, 0.72474974, -0.08918523, -1.0717514 , -1.27620014], [-0.99607282, 0.01622886, -0.79770612, -1.78027229, -1.98472103], [ 0.57069522, 1.5829969 , 0.76906192, -0.21350425, -0.41795299], [ 0.70100281, 1.71330448, 0.8993695 , -0.08319666, -0.2876454 ]]])
reduceat
The last method, reduceat, performs a “local reduce”, in essence an array groupby operation in which slices of the array are aggregated together. While it’s less flexible than the GroupBy capabilities in pandas, it can be very fast and powerful in the right circumstances. It accepts a sequence of “bin edges” which indicate how to split and aggregate the values:
np.add.reduceat
arr = np.arange(8)
arr[::2]
array([0, 2, 4, 6])
np.add.reduceat(arr, [1,5,6,2])
array([10, 5, 6, 27])
arr = np.multiply.outer(np.arange(4), np.arange(5))
arr
array([[ 0, 0, 0, 0, 0], [ 0, 1, 2, 3, 4], [ 0, 2, 4, 6, 8], [ 0, 3, 6, 9, 12]])
对于reduce at 可以选择列作为聚合的对象
np.add.reduceat(arr, [0, 2, 4], axis=1)
array([[ 0, 0, 0], [ 1, 5, 4], [ 2, 10, 8], [ 3, 15, 12]])
User Function
np.frompyfunc
Docstring:
frompyfunc(func, nin, nout)
Takes an arbitrary Python function and returns a Numpy ufunc.
Can be used, for example, to add broadcasting to a built-in Python
function (see Examples section).
func
.Returns
out : ufunc
Returns a Numpy universal function (ufunc
) object.
Notes
The returned ufunc always returns PyObject arrays.
Examples
Use frompyfunc to add broadcasting to the Python function oct
:
>>> oct_array = np.frompyfunc(oct, 1, 1)>>> oct_array(np.array((10, 30, 100)))array([012, 036, 0144], dtype=object)>>> np.array((oct(10), oct(30), oct(100))) # for comparisonarray(['012', '036', '0144'], dtype='|S4')Type: builtin_function_or_method
def user_function(a, b): return a**b
ufunc = np.frompyfunc(user_function, 2, 1)
ufunc(np.arange(10), np.arange(10))
array([1, 1, 4, 27, 256, 3125, 46656, 823543, 16777216, 387420489], dtype=object)
These functions provide a way to create ufunc-like functions, but they are very slow because they require a Python function call to compute each element, which is a lot slower than NumPy’s C-based ufunc loops.
Structured and Record Arrays
You may have noticed up until now that ndarray is a homogeneous data container; that is, it represents a block of memory in which each element takes up the same number of bytes, determined by the dtype. On the surface, this would appear to not allow you to represent heterogeneous or tabular-like data. A structured array is an ndarray in which each element can be thought of as representing a struct in C (hence the “struc- tured” name) or a row in a SQL table with multiple named fields:
dtype = [('x', np.float),('y',np.float128)]
sarr = np.array([(1.5, 6),(np.pi, -2)],dtype=dtype)
sarr
array([(1.5, 6.0), (3.141592653589793, -2.0)], dtype=[('x', '<f8'), ('y', '<f16')])
sarr.dtype.names
('x', 'y')
np.dtype.names
<attribute 'names' of 'numpy.dtype' objects>
structured data type
The elements of a numpy array is the same,so if we want to change the type of the elemnets here are three ways
- use columns of tuples like [(typy name,type)]
if you want to change the dimention of the type you can just add an int or tuple into the dtype columns like this
- [(type name,type,int)]
if you want to nest the dtype you can use something like this
- [(type name,[(typr name, type),(type name, type)])]
Sorting
arr.sort() is a in-place function which means it will change the sequence of the original array
- if you sort the view of an array ,the original array will be sorted too
- numpy.sort will create a new array
- arr[::-1] can return a descending array since the sort method will always return the ascending array
arr = np.array([0, 1, 2, 3, 5])
arr.argsort()
array([0, 1, 2, 3, 4])
first_name = np.array(['Bob', 'Jane', 'Steve', 'Bill', 'Barbara'])
last_name = np.array(['Jones', 'Arnold', 'Arnold', 'Jones', 'Walters'])
np.lexsort((first_name, last_name))
array([1, 2, 3, 0, 4])
sort = np.lexsort((first_name, last_name))
zip(last_name[sort], first_name[sort])
[('Arnold', 'Jane'), ('Arnold', 'Steve'), ('Jones', 'Bill'), ('Jones', 'Bob'), ('Walters', 'Barbara')]
numpy.lexisort
> lexisort can sort a hierachical data structure by thisarr.lexisort((secondary order list, primary order list))