DeepLearing学习笔记-行归一化和broadcasting

来源：互联网发布：js设置input的值编辑：程序博客网时间：2024/06/05 09:58

背景：

数据归一化能够提高梯度下降的收敛速度

归一化：

归一化方式：x∥x∥ ，每行除以其行向量的范数。
例如

x = [023644] (1)

范数的计算：

∥ x ∥ = n p . l i n a l g . n o r m (x, a x i s = 1, k e e p d i m s = T r u e) = [5 56 - - \sqrt] (2)

归一化的结果：

x_n o r m a l i z e d = x ∥ x ∥ = ⎡ ⎣ ⎢ ⎢ ⎢ 0 2 56 - - \sqrt 3 5 6 56 - - \sqrt 4 5 4 56 - - \sqrt ⎤ ⎦ ⎥ ⎥ ⎥ (3)

我们之所以可以在两个尺寸不同的矩阵之间做除法是因为python中的broadcasting机制。

python实现：

# GRADED FUNCTION: normalizeRowsdef normalizeRows(x):    """    Implement a function that normalizes each row of the matrix x (to have unit length).    Argument:    x -- A numpy matrix of shape (n, m)    Returns:    x -- The normalized (by row) numpy matrix. You are allowed to modify x.    """    ### START CODE HERE ### (≈ 2 lines of code)    # Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)    x_norm = None    x_norm = np.linalg.norm(x,axis=1,keepdims=True)    print("size of x=",np.shape(x))    print("size of x_norm=",np.shape(x_norm))    # Divide x by its norm.    x = x / x_norm    ### END CODE HERE ###    return xx = np.array([    [0, 3, 4],    [1, 6, 4]])print("normalizeRows(x) = " + str(normalizeRows(x)))

输出结果：

size of x= (2, 3)size of x_norm= (2, 1)ormalizeRows(x) = [[ 0.          0.6         0.8       ] [ 0.13736056  0.82416338  0.54944226]]

broadcasting

从上图x和x_norm的shape结果可看成，两个矩阵尺寸不同，那么之间是如何实现运算的呢？
对两个阵进行操作时，numpy逐元素地比较他们的形状，从后面的维度向前执行。当以下情形出现时，两个维度是兼容的：
1，它们相等
2，其中一个是1
如果这些条件都没有达到，将会抛出错误：frames are not aligned exception，表示两个阵列形状不兼容。结果阵列的尺寸与输入阵列的各维度最大尺寸相同。
下面这些例子不能广播

A      (1d array):  3B      (1d array):  4 # trailing dimensions do not match  #维度尺寸不兼容#从后往前，一个是3,一个是4A      (2d array):      2 x 1B      (3d array):  8 x 4 x 3 # second from last dimensions mismatched #倒数第二个维度不兼容

例子：

下面以softmax 函数为例，进一步说明：

for x∈R1×n, softmax(x)=softmax([x1x2…xn])=[ex1∑jexjex2∑jexj...exn∑jexj]
for a matrix x∈Rm×n, xij maps to the element in the ith row and jth column of x, thus we have:
$s o f t m a x (x) = s o f t m a x ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ x 11 x 21 ⋮ x m 1 x 12 x 22 ⋮ x m 2 x 13 x 23 ⋮ x m 3 \dots \dots ⋱ \dots x 1 n x 2 n ⋮ x m n ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ e x 11 \sum j e x 1 j e x 21 \sum j e x 2 j ⋮ e x m 1 \sum j e x m j e x 12 \sum j e x 1 j e x 22 \sum j e x 2 j ⋮ e x m 2 \sum j e x m j e x 13 \sum j e x 1 j e x 23 \sum j e x 2 j ⋮ e x m 3 \sum j e x m j \dots \dots ⋱ \dots e x 1 n \sum j e x 1 j e x 2 n \sum j e x 2 j ⋮ e x m n \sum j e x m j ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ = ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ s o f t m a x (first row of x) s o f t m a x (second row of x) . . . s o f t m a x (last row of x) ⎞ ⎠ ⎟ ⎟ ⎟ ⎟$

python 实现：

# GRADED FUNCTION: softmaxdef softmax(x):    """Calculates the softmax for each row of the input x.    Your code should work for a row vector and also for matrices of shape (n, m).    Argument:    x -- A numpy matrix of shape (n,m)    Returns:    s -- A numpy matrix equal to the softmax of x, of shape (n,m)    """    ### START CODE HERE ### (≈ 3 lines of code)    # Apply exp() element-wise to x. Use np.exp(...).    x_exp = np.exp(x)    # Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).    x_sum = np.sum(x_exp,axis=1,keepdims=True)    # Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.    s = x_exp / x_sum    ### END CODE HERE ###    return sx = np.array([    [9, 2, 5, 0, 0],    [7, 5, 0, 0 ,0]])print("softmax(x) = " + str(softmax(x)))

运行结果：

softmax(x) = [[  9.80897665e-01   8.94462891e-04   1.79657674e-02   1.21052389e-04    1.21052389e-04] [  8.78679856e-01   1.18916387e-01   8.01252314e-04   8.01252314e-04    8.01252314e-04]]

注意：broadcasting适用于矩阵之间使用+-*/这些运算符号，但是对于np.dot这种矩阵运算规则并不适用。

例子2：

# a.shape = (3,4)# b.shape = (4,1)for i in range(3):    for j in range(4):        c[i][j] = a[i][j] + b[j]

那么下面的那个式子是可以对应的：

c = a + b
c = a.T + b.T
c = a + b.T
c = a.T + b

考虑广播规则，我们可以知晓是第三个。

阅读全文

0 0