DeepLearing学习笔记-行归一化和broadcasting

来源:互联网 发布:js设置input的值 编辑:程序博客网 时间:2024/06/05 09:58

背景:

数据归一化能够提高梯度下降的收敛速度

归一化:

归一化方式:xx ,每行除以其行向量的范数。
例如

x=[023644](1)

范数的计算:
x=np.linalg.norm(x,axis=1,keepdims=True)=[556](2)

归一化的结果:
x_normalized=xx=02563565645456(3)

我们之所以可以在两个尺寸不同的矩阵之间做除法是因为python中的broadcasting机制。

python实现:

# GRADED FUNCTION: normalizeRowsdef normalizeRows(x):    """    Implement a function that normalizes each row of the matrix x (to have unit length).    Argument:    x -- A numpy matrix of shape (n, m)    Returns:    x -- The normalized (by row) numpy matrix. You are allowed to modify x.    """    ### START CODE HERE ### (≈ 2 lines of code)    # Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)    x_norm = None    x_norm = np.linalg.norm(x,axis=1,keepdims=True)    print("size of x=",np.shape(x))    print("size of x_norm=",np.shape(x_norm))    # Divide x by its norm.    x = x / x_norm    ### END CODE HERE ###    return xx = np.array([    [0, 3, 4],    [1, 6, 4]])print("normalizeRows(x) = " + str(normalizeRows(x)))

输出结果:

size of x= (2, 3)size of x_norm= (2, 1)ormalizeRows(x) = [[ 0.          0.6         0.8       ] [ 0.13736056  0.82416338  0.54944226]]

broadcasting

从上图x和x_norm的shape结果可看成,两个矩阵尺寸不同,那么之间是如何实现运算的呢?
对两个阵进行操作时,numpy逐元素地比较他们的形状,从后面的维度向前执行。当以下情形出现时,两个维度是兼容的:
1,它们相等
2,其中一个是1
如果这些条件都没有达到,将会抛出错误:frames are not aligned exception,表示两个阵列形状不兼容。结果阵列的尺寸与输入阵列的各维度最大尺寸相同。
下面这些例子不能广播

A      (1d array):  3B      (1d array):  4 # trailing dimensions do not match  #维度尺寸不兼容#从后往前,一个是3,一个是4A      (2d array):      2 x 1B      (3d array):  8 x 4 x 3 # second from last dimensions mismatched #倒数第二个维度不兼容

例子:

下面以softmax 函数为例,进一步说明:

  • for xR1×nsoftmax(x)=softmax([x1x2xn])=[ex1jexjex2jexj...exnjexj]

  • for a matrix xRm×nxij maps to the element in the ith row and jth column of x, thus we have: 

    softmax(x)=softmaxx11x21xm1x12x22xm2x13x23xm3x1nx2nxmn=ex11jex1jex21jex2jexm1jexmjex12jex1jex22jex2jexm2jexmjex13jex1jex23jex2jexm3jexmjex1njex1jex2njex2jexmnjexmj=softmax(first row of x)softmax(second row of x)...softmax(last row of x)

python 实现:

# GRADED FUNCTION: softmaxdef softmax(x):    """Calculates the softmax for each row of the input x.    Your code should work for a row vector and also for matrices of shape (n, m).    Argument:    x -- A numpy matrix of shape (n,m)    Returns:    s -- A numpy matrix equal to the softmax of x, of shape (n,m)    """    ### START CODE HERE ### (≈ 3 lines of code)    # Apply exp() element-wise to x. Use np.exp(...).    x_exp = np.exp(x)    # Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).    x_sum = np.sum(x_exp,axis=1,keepdims=True)    # Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.    s = x_exp / x_sum    ### END CODE HERE ###    return sx = np.array([    [9, 2, 5, 0, 0],    [7, 5, 0, 0 ,0]])print("softmax(x) = " + str(softmax(x)))

运行结果:

softmax(x) = [[  9.80897665e-01   8.94462891e-04   1.79657674e-02   1.21052389e-04    1.21052389e-04] [  8.78679856e-01   1.18916387e-01   8.01252314e-04   8.01252314e-04    8.01252314e-04]]

注意:broadcasting适用于矩阵之间使用+-*/这些运算符号,但是对于np.dot这种矩阵运算规则并不适用。

例子2:

# a.shape = (3,4)# b.shape = (4,1)for i in range(3):    for j in range(4):        c[i][j] = a[i][j] + b[j]

那么下面的那个式子是可以对应的:

  • c = a + b
  • c = a.T + b.T
  • c = a + b.T
  • c = a.T + b

考虑广播规则,我们可以知晓是第三个。

阅读全文
0 0
原创粉丝点击