8、NumPy 基础：数组和矢量运算

来源：互联网发布：狭义相对论知乎编辑：程序博客网时间：2024/05/22 15:07

一、NumPy 简介

NumPy 最重要的特点及功能：
- N维数组对象(ndarray)，该对象是一个快速而灵活的大数据集容器
- 具有矢量运算和复杂广播能力
- 具有用于对整组数据进行快速运算的标准数学函数(无需编写循环)
- 具有线性代数、随机数生成以及傅里叶变换功能
Numpy 可以存储任意数量的维度，你可以使用 ndarray 来表示我们以前涵盖的任何数据类型：标量、向量、矩阵或张量
Numpy 的两种基本对象：ndarray & ufunc
import numpy as np # 导入 numpy 包

二、ndarray 的创建、访问、属性和方法

1. 创建 ndarray

通过np.array(collection) 创建：
- collection 为列表(list of list)、元组、数组或其它序列类型
通过内置函数建：
- np.ones、np.ones_like; np.empty、np.empty_like; np.eye; np.zeros，创建一维数组只需传入维度大小即可；创建多维数组需传入一个表示形状的元组
- np.zeros_like(array) 以另一个数组为参数，并根据其形状和dtype创建一个全0数组
通过序列函数创建：
- np.arange(start, stop, step)创建：和 range 的区别是它返回的是数组而不是列表
- np.linspace(start, stop, N)：产生N个等距分布在[start, stop]间元素组成的数组，包括start & stop
- np.logspace(start, stop, N)：产生N个对数等距分布的数组，默认以10为底数
通过随机数函数创建：
- numpy.random.uniform(low=0.0, high=1.0, size=None)：默认产生[0, 1)之间，形状为 size 的均匀分布
- np.random.randint(low=0, high=None, size=None, dtype='l')：Return random integers from low (inclusive) to high (exclusive)
- np.random.normal(loc=0.0, scale=1.0, size=None)：默认产生形状为 size 的标准正态分布 (μ,σ2)=(0,1)
- size : int or tuple of ints
- np.random.permutation(x) : If x is an integer, randomly permute np.arange(x). If x is an array, make a copy and shuffle the elements randomly. Return a permuted sequence(ndarray)
随机打乱 arr 中的内容：
- np.random.shuffle(arr)：Modify a sequence in-place by shuffling its contents.
多个数组(ndarray)以相同顺序打乱

import numpy as np# 多个数组(ndarray，存放在 lst 中)以相同顺序打乱def shuffle_seqs(lst):    # lst[1] 中存放的是 label    random_order = np.random.permutation(len(lst[1]))    permuted_lst = []    for arr in lst:        permuted_lst.append(arr[random_order])  # 以数组索引数组    return permuted_lstX = [[0, 0], [0, 1], [1, 0], [1, 1]]  # 训练数据Y = [0, 1, 1, 0]                      # labelX = np.array(X, dtype='int32')Y = np.array(Y, dtype='int32')epoch = 1000for i in range(epoch):    # shuffle per epoch    X, Y = shuffle_seqs([X, Y])    print(X)    print(Y)>>> [[1 1]     [0 0]     [0 1]     [1 0]]>>> [0 0 1 1]

2. 访问 ndarray

索引
- 一维数组的索引和列表类似(可以逆序索引(arr[ : : -1])和负索引arr[-3])
- 二维数组的索引： arr[i, j] == arr[i][j]
- 在多维数组中，如果省略了后面的索引，则返回的对象会是一个维度低一点的ndarray(但它含有高一级维度上的某条轴上的所有数据)
- 条件索引：arr[conditon] # conditon 可以使用 & | 进行多条件组合
切片
- 一维数组的切片和列表类似
- 二维数组的索引：arr[r1:r2, c1:c2:step] # 也可指定 step 进行切片
- A slice of an array is a view into the same data, so modifying it will modify the original array(引用机制)
使用 ndarray/list 索引 ndarray

import numpy as np# 产生一个一组数组，使用数组/列表来索引出需要的元素(数组本身并不改变)x = np.arange(10, 1, -1)>>> array([10,  9,  8,  7,  6,  5,  4,  3,  2])x[np.array([3, 3, -3, 8])]         # 使用数组索引数组>>> array([7, 7, 4, 2])x[[8, 7, 6, 5, 4, 3, 2, 1, 0]]     # 使用列表索引数组>>> array([ 2,  3,  4,  5,  6,  7,  8,  9, 10])# 注意：这一点和 list 不同lst = [10,  9,  8,  7,  6,  5,  4,  3,  2]lst[[3, 3, -3, 8]]>>> TypeError: list indices must be integers or slices, not list

3. ndarray 间以及与标量间的运算

相同大小数组之间的任何算术运算都会将运算应用到元素级
不同大小数组之间(要求dimension 0 必须相同) 的运算叫做广播(broadcasting)
数组与标量的算术运算会将那个标量值传播到各个元素

4. Transposing Arrays and Swapping Axes

一维数组的转置不起作用(这和线代不同)，可以先 reshape 到二维再转置
二维数组的转置可用：arr.T or np.transpose(arr)，NumPy 在进行转置时不实际移动内存中的任何数据, 只是改变对原始矩阵的索引方式, 所以是非常高效的。但是，这也意味着你也要特别注意修改对象的方式，因为它们共享相同的数据。
高维数组的转置：需要得到一个由编号(0, 1, 2，…)组成的元组才能对这些轴进行转置，本质是轴对换，arr.swapaxes(1,2)

5. ndarray 的常用属性和方法

`a、常用属性`

ndim 属性：表示数组的维度个数
shape 属性：表示数组各个维度的大小
dtype 属性：表示数组中各数据类型，可通过astype函数转换数组的数据类型
Note：所有元素必须是相同类型(和 list 的区别)

`b、常用方法`

改变数组形状的方法
- reshape() 方法：改变数组的维度大小(可以把一个一维的向量转换成一个二维的矩阵)
- flatten() 方法：将多维数组转换为一维数组，可用arr.reshape(-1), np.reshape(arr, -1) # 注意并没有np.flatten()函数
- transpose() 方法：转置, arr.T or arr.transpose() or np.transpose(arr)
数组(数据类型)转换的方法
- tolist() 方法：将数组转换成列表
- tostring()==tobytes() 方法：根据数组的数据类型将其转换成不同长度的字符串(1bytes=8bits)
- astype('float32') 方法：转换数组的数据类型，eg：a = a.astype('float32')

import numpy as np # 整型默认数据类型为 'int32'a = np.array([1, 2, 3])a.dtype>>> dtype('int32')# 浮点型默认数据类型为 'float64'b = np.array([2.2, 3.3, 5])b.dtypedtype('float64')# 通过 astype 函数转换数组的数据类型c = b.astype('int32')c >>> array([2, 3, 5])c.dtype('int32')  >>> dtype('int32')# 注意指定数据类型时，要加上双引号！m = np.array([1, 2, 3], dtype='float32')m>>> array([ 1.,  2.,  3.], dtype=float32)m.dtype>>> dtype('float32')

数组组合的方法

Vertical stacking(row wise)
- 格式：np.vstack(tup)
- Equivalent to np.concatenate(tup, axis=0) if tup contains arrays that are at least 2-dimensional
Horizontal stacking(column wise)
- 格式：np.hstack(tup)
- Equivalent to np.concatenate(tup, axis=1)
Depth stacking(depth wise/along third dimension)
- 格式：np.dstack(tup)
- Equivalent to np.concatenate(tup, axis=2)
- 作用：Stack 1/2D arrays (images) into a single 3D array，从两个数组的对应位置各取出一个元素拼起来构成一个3维数组。
其它方法
- Column stacking：np.column_stack(tup)
- Row stacking：np.row_stack(tup)
- np.insert() 方法
- np.c_[]&&np.r_[]

代码示例

# 一维数组的 stackinga = np.array([1, 2, 3])b = np.array([4, 5, 6])np.vstack((a,b))  # 一维变二维array([[1, 2, 3],       [4, 5, 6]])np.hstack((a,b))  # 还是一维array([1, 2, 3, 4, 5, 6])np.dstack((a,b))  # 一维变三维array([[[1, 4],        [2, 5],        [3, 6]]])################################################### 二维数组的 stackingc = np.array([[1, 2, 3],              [4, 5, 6],              [7, 8, 9]])d = np.array([[ 2,  4,  6],              [ 8, 10, 12],              [14, 16, 18]])np.vstack((c, d))  # 相当于np.concatenate((c, d), axis=0)array([[ 1,  2,  3],       [ 4,  5,  6],       [ 7,  8,  9],       [ 2,  4,  6],       [ 8, 10, 12],       [14, 16, 18]])np.hstack((c, d))  # 相当于np.concatenate((c, d), axis=1)array([[ 1,  2,  3,  2,  4,  6],       [ 4,  5,  6,  8, 10, 12],       [ 7,  8,  9, 14, 16, 18]])np.dstack((c, d))  # 相当于np.concatenate((c, d), axis=2)array([[[ 1,  2],        [ 2,  4],        [ 3,  6]],       [[ 4,  8],        [ 5, 10],        [ 6, 12]],       [[ 7, 14],        [ 8, 16],        [ 9, 18]]])

数组分割的方法
- Vertical splitting
  - Split an array into multiple sub-arrays vertically (row-wise)
  - Return A list of sub-arrays
  - With a higher dimensional array the split is still along the first axis
  - 格式：np.vsplit(ary, indices_or_sections)
    - ary : Array to be divided into sub-arrays.
    - indices_or_sections : int or 1-D array
      - If indices_or_sections is an integer, N, the array will be divided into N equal arrays along axis. If such a split is not possible， an error is raised.
      - If indices_or_sections is a 1-D array of sorted integers, the entries indicate where along axis the array is split. For example, [2, 3] would, for axis=0, result in [ary[:2], ary[2:3], ary[3:]]按上面的区间分成三份，If an index exceeds the dimension of the array along axis, an empty sub-array is returned correspondingly.
- Horizontal splitting
  - Split an array into multiple sub-arrays horizontally (column-wise)
  - Return A list of sub-arrays
  - With a higher dimensional array the split is still along the second axis
- Depth-wise splitting
  - Split array into multiple sub-arrays along the 3rd axis (depth)
  - Return A list of sub-arrays
  - The array is always split along the third axis provided the array dimension is greater than or equal to 3

6. ndarrary 中添加多行或列数组的方法

import numpy as npa = np.array([[1, 2, 3], [4, 5, 6], [7 , 8 ,9]])b = np.eye(3)# 1.使用 np.c_[] 和 np.r_[] 分别添加行和列 np.c_[a,b]  # 添加某列时 b 可以是低一维的数据 array([[ 1.,  2.,  3.,  1.,  0.,  0.],       [ 4.,  5.,  6.,  0.,  1.,  0.],       [ 7.,  8.,  9.,  0.,  0.,  1.]])np.r_[a, b] # 添加某行时 a, b 必须维度相同 array([[ 1.,  2.,  3.],       [ 4.,  5.,  6.],       [ 7.,  8.,  9.],       [ 1.,  0.,  0.],       [ 0.,  1.,  0.],       [ 0.,  0.,  1.]])# 2.使用 np.insert() 方法添加行和列np.insert(a, 3, values=b, axis=1)  # 3代表 b 插入的位置，axis 表示那个轴，在此表示y轴(列)array([[1, 2, 3, 1, 0, 0],       [4, 5, 6, 0, 1, 0],       [7, 8, 9, 0, 0, 1]])np.insert(a, 3, values=b, axis=0)array([[1, 2, 3],       [4, 5, 6],       [7, 8, 9],       [1, 0, 0],       [0, 1, 0],       [0, 0, 1]])

三、通用函数(ufunc): 元素级运算

1. 常用的通用函数

np.ceil()：取向上最接近的整数
np.floor()：取向下最接近的整数
np.rint()：四舍五入
np.abs()：计算整数、浮点数或复数的绝对值
np.square()：计算各元素的平方
np.exp()：计算各元素的指数
np.sqrt()：计算各元素的平方根
np.isnan()：判断元素是否为 NaN(Not a Number)
np.add()：将数组中对应的元素相加
np.multiply()：数组元素相乘
np.where(condition, x, y)：矢量版本的三元表达式x if condition else y

2. 常用的统计方法

np.mean()，np.sum()
np.std()，np.var()：标准差和方差
np.max()，np.min()
np.argmax()，np.argmin()：最大和最小元素的索引
np.argwhere(condition)：找出符合条件元素的索引
np.cumsum()：所有元素的累加和
np.cumprod()：所有元素的累计积
注意：多维的话要指定统计的维度(eg: arr.mean(axis = 1))，否则默认是在全部维度上做统计
np.all()：全部满足条件
np.any()：至少有一个元素满足条件
np.unique()：找到唯一值并返回排序结果

3. Numpy 矩阵乘法/内积

Numpy 矩阵乘法
- 元素级乘法：你可以使用 np.multiply() 函数或 * 运算符来实现
- 矩阵乘积：可以使用np.matmul() 函数实现
NumPy 的 np.dot() 函数
- For 2-D arrays it is equivalent to matrix multiplication，just like np.matmul()
- For 1-D arrays to inner product of vectors (without complex conjugation)

4. 操作数组和文本文件

NumPy 能够读写磁盘上的文本数据或二进制数据
np.load & np.save 是读写磁盘数组数据的两个主要函数
将数组以二进制格式保存到磁盘

# np.save# 默认情况下，数组是以未压缩的原始二进制格式保存在扩展名为.npy的文件中的# 如果文件路径末尾没有扩展名.npy, 则该扩展名会被自动加上arr = np.arrange(10)np.save('some_array', arr)# np.loadnp.load('some_array.npy')              # load 的时候要加上后缀名array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

存取文本文件

# loadtxtfilename = './presidential_polls.csv'data_array = np.loadtxt(filename,        # 文件名                        delimiter=',',   # 分隔符                        dtype=str,       # 数据类型                        usecols=(0,2,3)) # 指定读取的列索引号print data_array, data_array.shape[['cycle' 'type' 'matchup'] ['2016' '"polls-plus"' '"Clinton vs. Trump vs. Johnson"'] ['2016' '"polls-plus"' '"Clinton vs. Trump vs. Johnson"'] ...,  ['2016' '"polls-only"' '"Clinton vs. Trump vs. Johnson"'] ['2016' '"polls-only"' '"Clinton vs. Trump vs. Johnson"'] ['2016' '"polls-only"' '"Clinton vs. Trump vs. Johnson"']] (10237L, 3L)# loadtxt, 明确指定每列数据的类型filename = './presidential_polls.csv'data_array = np.loadtxt(filename,      # 文件名                        delimiter=',', # 分隔符                        skiprows=1,                        dtype={'names':('cycle', 'type', 'matchup'),                               'formats':('i4', 'S15', 'S50')},     # 数据类型                        usecols=(0,2,3)) # 指定读取的列索引号print data_array, data_array.shape # 读取的结果是一维的数组，每个元素是一个元组[(2016, '"polls-plus"', '"Clinton vs. Trump vs. Johnson"') (2016, '"polls-plus"', '"Clinton vs. Trump vs. Johnson"') (2016, '"polls-plus"', '"Clinton vs. Trump vs. Johnson"') ..., (2016, '"polls-only"', '"Clinton vs. Trump vs. Johnson"') (2016, '"polls-only"', '"Clinton vs. Trump vs. Johnson"') (2016, '"polls-only"', '"Clinton vs. Trump vs. Johnson"')] (10236L,)# 保存文本文件：np.savetxt# 文本读写主要用 pandas 实现，这里就不介绍啦

0 0