[Python for Data Anlysis]CH04 Numpy Basics -- Arrays and Vectorized Computation
来源:互联网 发布:人力资源分析软件 编辑:程序博客网 时间:2024/05/21 19:28
NumPy Basics: Arrays and Vectorized Computation
NumPy, short for Numerical Python, is the fundamental package required for high
performance scientific computing and data analysis.
- ndarray
- mathematical functions for fast operations on entire arrays of data without having to write loop
- Tools for reading data form disk
- Linear Algebra, random number generation, Fourier transformation
- Tools for interrating code wiritten in C, C++, Fortran
基本设置
%matplotlib inlinefrom __future__ import divisionfrom numpy.random import randnimport numpy as npnp.set_printoptions(precision=4, suppress=True)
NumPy ndarray: A Multidimensional Array Object
基本使用
data = randn(2, 3)data *10data + datadata.shapedata.dtype
Creating ndarray
Array
它能接受任何序列, 然后创建一个NumPy array,包含输入的序列zeros and ones
zeros 和 ones创建对应shape的array, 而且分别全为0,1.empty
empty creats an array without initializing its values to any particular valuearange
arange 将range变为对应的array
#arraydata1= [6,7.5,8,0,1]arr1 = np.array(data1)#二维序列 nested sequencesdata2 = [[1,2,3,4],[5,6,7,8]]arr2 = np.array(data2)#zeros, onesa1 = np.zeros(10)a2 = np.ones((2,3))#emptynp.empty(10)#arangenp.arange(15)
Data Types for ndarrays
主要时用于计算memory大小的,后面数字表示bit位数, double(float)8字节,所以要64bits
arr1 = np.array([1,2,3],dtype = np.float64)arr2 = np.array([1,2,3],dtype = np.int32)arr1.dtypearr2.dtype
casting dtypes between different arrays
类型给定方法:
1. 初始化时默认给定
2. 初始化时给定
3. arr.astype(给定dtype,或这另一个arr2.dtype)
astype always creates a new array,不论类型有没有被改变
#1. 初始化默认给定arr = np.arange(1,6)#2. 初始化是给定numeric_strings = np.array(['1.25','-9.6','42'],dtype = np.string_)#3. 改变数据类型float_arr = arr.astype(np.float64) #cast int64 to float64numeric_strings.astype(float) #if cast fail for some reason, a TypeError will be raised,# Numpy is smart enough to alias Python types to equivalent dtypes# arr2.dtypearr1 = np.arange(10)arr2 = randn(2,3)arr1.astype(arr2.dtype),arr1.dtype
Operations between Arrays and Scalars
和R, Matlab一致,
所有的*, + ,-,/是对应元素间的操作
arr = np.array([[1., 2., 3.], [4., 5., 6.]])arr#二元运算符 arr + arrarr - arrarr * arrarr / arr
#一元运算符1 / arrarr ** 0.5
Bacis Indexing and Sclicing
One dimension
Array slices are views on the original array,
and any modifications to the view will be reflected in the source array.
arr = np.arange(10)arrarr[5]arr[5:8]arr[5:8] = 12arr
arr_slice = arr[5:8]arr_slice[1] = 12345arrarr_slice[:] = 64arr
copy of the slice of the array
arr[5:8].copy()arr_slice_copy = arr[5:8].copy()arr_slice_copy[1] = 1arr_slice_copyarr
Higher Dimension
The elements at each index are no longer scalars but rather corresponding arrays
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])arr2d[2]arr2d[0][2],arr2d[0,2]arr3d = np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])arr3darr3d.shapearr3d[0]arr3d[0] = 42arr3d[1, 0]
Indexing with slices
view of original array
arr[1:6]arr2d# 仅有一个表示行arr2d[:2]# 两个则分别表示行和列arr2d[:2, 1:]arr2d[1, :2]
Boolean Indexing
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])data = randn(7, 4)namesdata
names == 'Bob'data[names == 'Bob'] data[names == 'Bob', 2:]data[names == 'Bob', 3]mask = (names == 'Bob') | (names == 'Will') #do not support keywords and, ormaskdata[mask]data[data<0] = 0datadata[names!='Joe'] = 7data
Fancy Indexing
Indexing using integer arrays
arr = np.empty((8, 4))for i in range(8): arr[i] = iarr
arr[[4, 3, 0, 6]]arr[[-3,-5,-7]]
arr = np.arange(32).reshape((8, 4))arrarr[[1, 5, 7, 2], [0, 3, 1, 2]]arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]arr[np.ix_([1, 5, 7, 2], [0, 3, 1, 2])]
Transposing arrays and swapping axes
arr = np.arange(15).reshape((3, 5))arrarr.T
arr = np.random.randn(6, 3)np.dot(arr.T, arr)
transpose(), swapaxes()暂时用不到
Universal Functions: Element-wise Array Functions
一些快速的函数,element-wise的函数
arr = np.arange(10)np.sqrt(arr)np.exp(arr)
参数为多个array
x = randn(8)y = randn(8)xynp.maximum(x, y) # element-wise maximum
返回多个值
arr = randn(7) * 5np.modf(arr)
Uinary functions
Binary functions
Data processing using arrays
vectorization把loop转换成array expression: faster
Expressing conditional logic as array operations
- pure python
result = [x if c else y for x,y,c in zip(x,y,c)
numpy
result = np.where(c,x,y)arr = randn(4, 4)arrnp.where(arr > 0, 2, -2)np.where(arr > 0, 2, arr) # set only positive values to 2
Mathematical and statistical methods
mean
arr = np.random.randn(5, 4) # normally-distributed dataarr.mean()np.mean(arr)arr.sum()
按行列,0为列,1 为行
arr.mean(axis=1)arr.sum(0)
cumsum, cumprod
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])arr.cumsum(0)arr.cumprod(1)
Methods for boolean arrays
统计正数
arr = randn(100)(arr > 0).sum() # Number of positive values
- 是否存在any,是否都all
bools = np.array([False, False, True, False])
bools.any()
bools.all()
Sorting
arr.sort()
arr = randn(8)arrarr.sort()arr
arr.sort(1)
arr.sort(1)
np.sort()
np.sort(arr)
Unique and other set logic
np.unique(arr)
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])np.unique(names)ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])np.unique(ints)
np.in1d(arr1,arr2)
values = np.array([6, 0, 0, 3, 2, 5, 6])np.in1d(values, [2, 3, 6])
File input and output with arrays
Storing arrays on disk in binary format
arr = np.arange(10)np.save('some_array', arr)np.load('some_array.npy')
np.savez('array_archive.npz', a=arr, b=arr)arch = np.load('array_archive.npz')arch['b'] #dict-like
Saving and loading text files
pandas里面的read_csv和read_table 较为常用
arr = np.loadtxt('array_ex.txt', delimiter=',')arr
Linear algebra
from numpy.linalg import inv, qr
1. A %*% B
“`python
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
x
y
x.dot(y) # equivalently np.dot(x, y)
```
2. QR分解
“`
from numpy.linalg import inv, qr
X = randn(5, 5)
mat = X.T.dot(X)
inv(mat)
mat.dot(inv(mat))
q, r = qr(mat)
r
Random number generation
samples = np.random.normal(size=(4, 4))samples
from random import normalvariateN = 1000000%timeit samples = [normalvariate(0, 1) for _ in xrange(N)]%timeit np.random.normal(size=N)
Example: Random Walks
pure python
import randomposition = 0walk = [position]steps = 1000for i in xrange(steps): step = 1 if random.randint(0, 1) else -1 position += step walk.append(position)
numpy
np.random.seed(12345)nsteps = 1000draws = np.random.randint(0, 2, size=nsteps)steps = np.where(draws > 0, 1, -1)walk = steps.cumsum()
初探random walk
walk.min()
walk.max()
找出初次到达10或-10的时刻
(np.abs(walk)>=10).argmax()
Simulating many random walks at once
nwalks = 5000nsteps = 1000draws = np.random.randint(0, 2, size=(nwalks, nsteps)) # 0 or 1steps = np.where(draws > 0, 1, -1)walks = steps.cumsum(1) #对行求和walks
初探random walk
walks.max()walks.min()hits30 = (np.abs(walks) >= 30).any(1)hits30hits30.sum() # Number that hit 30 or -30crossing_times = (np.abs(walks[hits30]) >= 30).argmax(1)crossing_times.mean()
正态分布 random walk
steps = np.random.normal(loc=0, scale=0.25, size=(nwalks, nsteps))
- [Python for Data Anlysis]CH04 Numpy Basics -- Arrays and Vectorized Computation
- 1. NumPy Basics: Arrays and Vectorized Computation
- NumPy Basics: Arrays and Vectorized Computation
- 《python for data analysis》笔记二--Numpy 基础:arrays 和向量化计算 1
- 《python for data analysis》笔记三--Numpy基础:arrays和向量化计算2
- Python-Numpy-Basics-Notes
- Python+Basics+With+Numpy+v3
- [Python]--What are the differences between numpy arrays and matrices?
- Evolutionary Computation for Modeling and Optimization
- introduction to computation and programming using python
- [Python for Data Analysis]Python Basic2--Data Structure and Sequences
- Basics of Cube Aggregates and Data Rollup
- Principles of Data Mining (Adaptive Computation and Machine Learning)
- stage1_fast_rcnn_train.pt anlysis for zouyu
- neural networks deep learning Python Basics with numpy (optional) Homework
- Mining Twitter Data with Python Part 5: Data Visualisation Basics
- Numpy Arrays
- The Basics of Numpy
- use count of shared_ptr
- [计网学习笔记(1)] 计算机网络概述
- [Lintcode]Longest Increasing Continuous Subsequence
- [Lintcode]Longest Words
- ButterKnife Zelezny 抛出 ArrayIndexOutOfBoundsException
- [Python for Data Anlysis]CH04 Numpy Basics -- Arrays and Vectorized Computation
- Unlocking the Motorola Bootloader (Feb, 2016)
- Full TrustZone exploit for MSM8974
- 编译VirtualNES虚拟红白机
- 3252: 攻略 dfs序+线段树
- 地图之美(二)——帅爆了!!!
- 企业为什么要去竞争?
- Unity3D官方VR教学学习笔记之事件event与委托delegate
- iOS-多线程总结