深度学习DeepLearning.ai系列课程学习总结：4. Logistic代码实战

来源：互联网发布：淘宝购物的金钱过程编辑：程序博客网时间：2024/06/17 09:09

转载过程中，图片丢失，代码显示错乱。
为了更好的学习内容，请访问原创版本：
http://www.missshi.cn/api/view/blog/59aa08fee519f50d04000170
Ps：初次访问由于js文件较大，请耐心等候（8s左右）

本节课中，我们将学习如何利用Python的来Logistic。

这是第一节Python代码内容，接下来我们将从一些基本的Python编程开始讲述。

使用numpy构建基本函数

numpy是Python在科学计算中最常用的库。接下来我们将要学习一些numpy中包含的常用函数。

练习1：利用np.exp()实现sigmod函数：

在利用np.exp()函数之前，我们首先使用math.exp()函数来实现sigmod函数，并将二者对比来突出np.exp()的优点。

其中，


1
2
3
4
5
6
7
8
9
10
11
importmath
defbasic_sigmod(x):
"""
#计算单个标量的sigmod函数
"""
s=1.0/(1+1/math.exp(x))
returns
printbasic_sigmod(3)
#0.9525741268224334

上述描述了如何对一个标量执行sigmod函数，而在深度学习的应用中，我们通过是对向量或者矩阵来执行sigmod运算。

如何执行将该函数用于矢量或者矩阵，那么系统会抛出异常：


1
printbasic_sigmod([3,2,1])

而如果使用的是np.exp函数的话，如果输入的是一个矢量或者矩阵，那么对应的输出也会是矢量或矩阵，即针对每个元素进行指数计算。


1
2
3
4
importnumpyasnp
x=np.array([1,2,3])
printnp.exp(x)
#[2.718281837.389056120.08553692]

此外，对于numpy array类型的变量，其加减乘除的方法也统一被改写。

以下面的例子为例：


1
2
3
x=np.array([1,2,3])
printx+3
#[456]

接下来，我们来实现一个真正的、可用于矢量或矩阵的sigmod函数：

其需求如下：


1
2
3
4
5
6
7
8
9
10
11
12
importnumpyasnp
defsigmod(x):
"""
#sigmod函数，可用于矢量和矩阵
"""
s=1.0/(1+1/np.exp(x))
returns
x=np.array([1,2,3])
printnp.exp(x)
#[0.73105858,0.88079708,0.95257413]

练习2：计算sigmod函数的导数

在之前的理论课程中，我们学习到了sigmod函数的导数公式如下：

接下来，我们通过Python代码进行实现：


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
defsigmoid_derivative(x):
"""
Computethegradient(alsocalledtheslopeorderivative)ofthesigmoidfunctionwithrespecttoitsinputx.
Youcanstoretheoutputofthesigmoidfunctionintovariablesandthenuseittocalculatethegradient.
Arguments:
x--Ascalarornumpyarray
Return:
ds--Yourcomputedgradient.
"""
s=1.0/(1+1/np.exp(x))
ds=s*(1-s)
returnds
x=np.array([1,2,3])
print"sigmoid_derivative(x)="+str(sigmoid_derivative(x))
#sigmoid_derivative(x)=[0.196611930.104993590.04517666]

练习3：将一副图像转为为一个向量

在numpy中，有两个常用的函数：np.shape和np.reshape()。

其中，X.shape可以用于查看当前矩阵的维度。

X.reshape()可以用于修改矩阵的维度或形状。

例如，对于一副彩色图像，其通常是由一个三维矩阵组成的（RGB三个通道）。然而，在深度学习的应用中，我们通常需要将其转换为一个矢量，其长度为3*length*width。

即我们需要将一个三维的矩阵转换为一个一维的向量。

接下来，我们需要实现一个image2vector函数，其输入为一个三维矩阵（length, height, 3），输出为一个矢量。


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
defimage2vector(image):
"""
Argument:
image--anumpyarrayofshape(length,height,depth)
Returns:
v--avectorofshape(length*height*depth,1)
"""
v=image.reshape((image.shape[0]*image.shape[1]*image.shape[2],1))
returnv
image=np.array([[[0.67826139,0.29380381],
[0.90714982,0.52835647],
[0.4215251,0.45017551]],
[[0.92814219,0.96677647],
[0.85304703,0.52351845],
[0.19981397,0.27417313]],
[[0.60659855,0.00533165],
[0.10820313,0.49978937],
[0.34144279,0.94630077]]])
print"image2vector(image)="+str(image2vector(image))
#[[0.67826139][0.29380381][0.90714982][0.52835647][0.4215251][0.45017551][0.92814219][0.96677647][0.85304703][0.52351845][0.19981397][0
    .27417313][0.60659855][0.00533165][0.10820313][0.49978937][0.34144279][0.94630077]]

练习4：按行归一化

在深度学习中，常用的一个技巧是需要对我们的数据进行归一化。

通过，在对数据进行归一化后，梯度下降算法的收敛速度会明显加快。

接下来，我们需要对一个矩阵进行按行归一化，归一化后的结果是每一个的长度为1。

例如：


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
defnormalizeRows(x):
"""
Implementafunctionthatnormalizeseachrowofthematrixx(tohaveunitlength).
Argument:
x--Anumpymatrixofshape(n,m)
Returns:
x--Thenormalized(byrow)numpymatrix.Youareallowedtomodifyx.
"""
x_norm=np.linalg.norm(x,axis=1,keepdims=True)#计算每一行的长度，得到一个列向量
x=x/x_norm#利用numpy的广播，用矩阵与列向量相除。
returnx
x=np.array([
[0,3,4],
[1,6,4]])
print"normalizeRows(x)="+str(normalizeRows(x))
#normalizeRows(x)=[[0.0.60.8][0.137360560.824163380.54944226]]

在上面的代码中，我们利用了广播的特性，接下来我们主要学习一下广播的使用。

练习5：广播的使用及softmax函数的实现

广播是numpy中一个非常强大的功能，它可以帮助我们对不同维度的矩阵、向量、标量之前快速计算。

接下来，我们需要实现一个softmax函数，其定义如下：


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
defsoftmax(x):
"""Calculatesthesoftmaxforeachrowoftheinputx.
Yourcodeshouldworkforarowvectorandalsoformatricesofshape(n,m).
Argument:
x--Anumpymatrixofshape(n,m)
Returns:
s--Anumpymatrixequaltothesoftmaxofx,ofshape(n,m)
"""
x_exp=np.exp(x)#(n,m)
x_sum=np.sum(x_exp,axis=1,keepdims=True)#(n,1)
s=x_exp/x_sum#(n,m)广播的作用
returns
x=np.array([
[9,2,5,0,0],
[7,5,0,0,0]])
print"softmax(x)="+str(softmax(x))
#softmax(x)=[[9.80897665e-018.94462891e-041.79657674e-021.21052389e-041.21052389e-04][8.78679856e-011.18916387e-018.01252314e-048.01252314e-048.01252314e-04]]

矢量化

在深度学习中，我们通常会处理大数据量的数据集。

因此=，计算速度可能会成为整个训练过程中的瓶颈。

为了保证我们计算的效率，我们需要对进行过程矢量化。

接下来，我们对比一下是否使用矢量化对于点乘、外积和按元素相乘等操作来说，计算效率的比较。

首先，利用原生方法的实现过程如下：


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
importtime
x1=[9,2,5,0,0,7,5,0,0,0,9,2,5,0,0]
x2=[9,2,2,9,0,9,2,5,0,0,9,2,5,0,0]
###CLASSICDOTPRODUCTOFVECTORSIMPLEMENTATION###
tic=time.process_time()
dot=0
foriinrange(len(x1)):
dot+=x1[i]*x2[i]
toc=time.process_time()
print("dot-----Computationtime="+str(1000*(toc-tic))+"ms")
###CLASSICOUTERPRODUCTIMPLEMENTATION###
tic=time.process_time()
outer=np.zeros((len(x1),len(x2)))#wecreatealen(x1)*len(x2)matrixwithonlyzeros
foriinrange(len(x1)):
forjinrange(len(x2)):
outer[i,j]=x1[i]*x2[j]
toc=time.process_time()
print("outer-----Computationtime="+str(1000*(toc-tic))+"ms")
###CLASSICELEMENTWISEIMPLEMENTATION###
tic=time.process_time()
mul=np.zeros(len(x1))
foriinrange(len(x1)):
mul[i]=x1[i]*x2[i]
toc=time.process_time()
print("elementwisemultiplication-----Computationtime="+str(1000*(toc-tic))+"ms")
###CLASSICGENERALDOTPRODUCTIMPLEMENTATION###
W=np.random.rand(3,len(x1))#Random3*len(x1)numpyarray
tic=time.process_time()
gdot=np.zeros(W.shape[0])
foriinrange(W.shape[0]):
forjinrange(len(x1)):
gdot[i]+=W[i,j]*x1[j]
toc=time.process_time()
print("gdot-----Computationtime="+str(1000*(toc-tic))+"ms")
#dot-----Computationtime=0.17002099999974263ms
#outer-----Computationtime=0.34057500000006513ms
#elementwisemultiplication-----Computationtime=0.1940779999998199ms
#gdot-----Computationtime=0.2362039999999066ms

接下来，利用矢量化实现的结果如下：


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
x1=[9,2,5,0,0,7,5,0,0,0,9,2,5,0,0]
x2=[9,2,2,9,0,9,2,5,0,0,9,2,5,0,0]
###VECTORIZEDDOTPRODUCTOFVECTORS###
tic=time.process_time()
dot=np.dot(x1,x2)
toc=time.process_time()
print("dot-----Computationtime="+str(1000*(toc-tic))+"ms")
###VECTORIZEDOUTERPRODUCT###
tic=time.process_time()
outer=np.outer(x1,x2)
toc=time.process_time()
print("outer-----Computationtime="+str(1000*(toc-tic))+"ms")
###VECTORIZEDELEMENTWISEMULTIPLICATION###
tic=time.process_time()
mul=np.multiply(x1,x2)
toc=time.process_time()
print("elementwisemultiplication-----Computationtime="+str(1000*(toc-tic))+"ms")
###VECTORIZEDGENERALDOTPRODUCT###
tic=time.process_time()
dot=np.dot(W,x1)
toc=time.process_time()
print("gdot-----Computationtime="+str(1000*(toc-tic))+"ms")
#dot-----Computationtime=0.16546899999991815ms
#outer-----Computationtime=0.14168100000011563ms
#elementwisemultiplication-----Computationtime=0.10738799999998605ms
#gdot-----Computationtime=0.38393900000022185ms

从上述结果中，我们可以看到矢量化的代码明显简单了很多。

同时，运行时间也有了一定程度的降低。降低的幅度不大主要是由于数据量较小的原因，随着数据量的增大，减小的幅度也会越来越明显。

练习1：L1误差函数的实现

我们需要使用numpy函数来实现L1误差函数：

其中，L1误差函数的定义如下：

^y表示估计值，y表示真实值。


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
importnumpyasnp
defL1(yhat,y):
"""
Arguments:
yhat--vectorofsizem(predictedlabels)
y--vectorofsizem(truelabels)
Returns:
loss--thevalueoftheL1lossfunctiondefinedabove
"""
loss=np.sum(np.abs(y-yhat))
returnloss
yhat=np.array([.9,0.2,0.1,.4,.9])
y=np.array([1,0,0,1,1])
print"L1="+str(L1(yhat,y))
#L1=1.1

练习2：L2误差函数的实现

L2误差函数的定义如下：


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
importnumpyasnp
defL2(yhat,y):
"""
Arguments:
yhat--vectorofsizem(predictedlabels)
y--vectorofsizem(truelabels)
Returns:
loss--thevalueoftheL2lossfunctiondefinedabove
"""
loss=np.sum(np.power((y-yhat),2))
returnloss
yhat=np.array([.9,0.2,0.1,.4,.9])
y=np.array([1,0,0,1,1])
print"L2="+str(L2(yhat,y))
#L2=0.43

Logistic的实现

接下来的内容中，我们将实现一个完成Logistic函数。包括：初始化、计算代价函数和梯度、使用梯度下降算法进行优化等并把他们整合成为一个函数。

本实验用于通过训练来判断一副图像是否为猫。

在这个过程中，我们将会用到如下库：

numpy：Python科学计算中最重要的库

h5py：Python与H5文件交互的库

mathplotlib：Python画图的库

PIL：Python图像相关的库

scipy：Python科学计算相关的库

在程序的开头，我们首先需要引入相关的库：


1
2
3
4
5
6
7
8
importnumpyasnp
importmatplotlib.pyplotasplt
importh5py
importscipy
fromPILimportImage
fromscipyimportndimage
%matplotlibinline#设置matplotlib在行内显示图片

在训练之前，首先需要读取数据，读取数据的代码如下：


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
defload_dataset():
"""
#加载数据集
"""
train_dataset=h5py.File('datasets/train_catvnoncat.h5',"r")#读取H5文件
train_set_x_orig=np.array(train_dataset["train_set_x"][:])#yourtrainsetfeatures
train_set_y_orig=np.array(train_dataset["train_set_y"][:])#yourtrainsetlabels
test_dataset=h5py.File('datasets/test_catvnoncat.h5',"r")
test_set_x_orig=np.array(test_dataset["test_set_x"][:])#yourtestsetfeatures
test_set_y_orig=np.array(test_dataset["test_set_y"][:])#yourtestsetlabels
classes=np.array(test_dataset["list_classes"][:])#thelistofclasses
train_set_y_orig=train_set_y_orig.reshape((1,train_set_y_orig.shape[0]))#对训练集和测试集标签进行reshape
test_set_y_orig=test_set_y_orig.reshape((1,test_set_y_orig.shape[0]))
returntrain_set_x_orig,train_set_y_orig,test_set_x_orig,test_set_y_orig,classes
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

数据说明：

对于训练集的标签而言，对于猫，标记为1，否则标记为0。

每一个图像的维度都是(num_px, num_px, 3)，其中，长宽相同，3表示是RGB图像。

train_set_x_orig和test_set_x_orig中，包含_orig是由于我们稍候需要对图像进行预处理，预处理后的变量将会命名为train_set_x和train_set_y。

train_set_x_orig中的每一个元素对于这一副图像，我们可以用如下代码将图像显示出来：


1
2
3
4
index = 25
plt.imshow(train_set_x_orig[index])
print "y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture."
# y = [1], it's a 'cat' picture.

接下来，我们需要根据图像集来计算出训练集的大小、测试集的大小以及图片的大小：


1
2
3
4
5
m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]
print (m_train, m_test, num_px)
# 209, 50, 64

接下来，我们需要对将每幅图像转为一个矢量，即矩阵的一列。

最终，整个训练集将会转为一个矩阵，其中包括num_px*numpy*3行，m_train列。


1
2
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T

Ps：其中X_flatten = X.reshape(X.shape[0], -1).T可以将一个维度为(a,b,c,d)的矩阵转换为一个维度为(b∗c∗d, a)的矩阵。

接下来，我们需要对图像值进行归一化。

由于图像的原始值在0到255之间，最简单的方式是直接除以255即可。


1
2
train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.

接下来，我们来看一下Logistic的结构：

对于每个训练样本x，其误差函数的计算方式如下：

而整体的代价函数计算如下：

接下来，我们将按照如下步骤来实现Logistic：

1. 定义模型结构

2. 初始化模型参数

3. 循环

3.1 前向传播

3.2 反向传递

3.3 更新参数

4. 整合成为一个完整的模型

Step1：实现sigmod函数


1
2
3
4
5
6
7
8
9
10
11
12
def sigmoid(z):
    """
    Compute the sigmoid of z
    Arguments:
    z -- A scalar or numpy array of any size.
    Return:
    s -- sigmoid(z)
    """
    s = 1.0 / (1 + 1 / np.exp(z))
    return s

Step2：初始化参数


1
2
3
4
5
6
7
8
9
10
11
12
13
14
def initialize_with_zeros(dim):
    """
    This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
    
    Argument:
    dim -- size of the w vector we want (or number of parameters in this case)
    
    Returns:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias)
    """
    w = np.zeros((dim, 1))
    b = 0
    return w, b

Step3：前向传播与反向传播

Ps：计算公式如下：（具体计算公式来源请查看之前的理论课）


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation explained above
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)
    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b
    
    Tips:
    - Write your code step by step for the propagation. np.log(), np.dot()
    """
    
    m = X.shape[1]
    
    # FORWARD PROPAGATION (FROM X TO COST)
    A = sigmoid(np.dot(w.T, X) + b)                                     # compute activation
    cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))                                  # compute cost
    
    # BACKWARD PROPAGATION (TO FIND GRAD)
    dw = 1 / m * np.dot(X, (A - Y).T)
    db = 1 / m * np.sum(A - Y)
    cost = np.squeeze(cost)
    
    grads = {"dw": dw,
             "db": db}
    
    return grads, cost

Step4：更新参数

更新参数的公式如下：

完整代码如下：


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    """
    This function optimizes w and b by running a gradient descent algorithm
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of shape (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
    num_iterations -- number of iterations of the optimization loop
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- True to print the loss every 100 steps
    
    Returns:
    params -- dictionary containing the weights w and bias b
    grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
    costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
    
    Tips:
    You basically need to write down two steps and iterate through them:
        1) Calculate the cost and the gradient for the current parameters. Use propagate().
        2) Update the parameters using gradient descent rule for w and b.
    """
    
    costs = []
    
    for i in range(num_iterations): #每次迭代循环一次， num_iterations为迭代次数
        
        
        # Cost and gradient calculation 
        grads, cost = propagate(w, b, X, Y)
        
        # Retrieve derivatives from grads
        dw = grads["dw"]
        db = grads["db"]
        
        # update rule 
        w = w - learning_rate * dw
        b = b - learning_rate * db
        
        # Record the costs
        if i % 100 == 0:
            costs.append(cost)
        
        # Print the cost every 100 training examples
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    params = {"w": w,
              "b": b}
    grads = {"dw": dw,
             "db": db}
    return params, grads, costs

Step5：利用训练好的模型对测试集进行预测：

计算公式如下：

当输入大于0.5时，我们认为其预测认为结果是猫，否则不是猫。


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def predict(w, b, X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    
    Returns:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    '''
    
    m = X.shape[1]
    Y_prediction = np.zeros((1,m))
    w = w.reshape(X.shape[0], 1)
    
    # Compute vector "A" predicting the probabilities of a cat being present in the picture
    A = sigmoid(np.dot(w.T, X) + b)
    
    for i in range(A.shape[1]):
        # Convert probabilities A[0,i] to actual predictions p[0,i]
        if A[0][i] > 0.5:
            Y_prediction[0][i] = 1
        else:
            Y_prediction[0][i] = 0
    
    return Y_prediction

Step5：将以上功能整合到一个模型中：


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    """
    Builds the logistic regression model by calling the function you've implemented previously
    
    Arguments:
    X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- Set to true to print the cost every 100 iterations
    
    Returns:
    d -- dictionary containing information about the model.
    """
    # initialize parameters with zeros 
    w, b = initialize_with_zeros(X_train.shape[0])
    # Gradient descent
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    
    # Retrieve parameters w and b from dictionary "parameters"
    w = parameters["w"]
    b = parameters["b"]
    
    # Predict test/train set examples 
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)
    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

测试一下该模型吧：


1
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

此时，观察打印结果，我们可以发现我们的测试准确率已经可以达到70.0%。

而对于训练集，其准确性达到了99%。这表明了我们的模型有着一定的过拟合，不过不要着急，我们会在后续的内容中来解决这一问题。

使用如下代码，我们可以挑选其中的一些图片来看我们的预测结果：


1
2
3
4
# Example of a picture that was wrongly classified.
index = 14
plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3)))
print ("y = " + str(test_set_y[0,index]) + ", you predicted that it is a \"" + classes[d["Y_prediction_test"][0,index]].decode("utf-8") +  "\" picture.")

此外，我们还可以画出我们的代价函数变化曲线：


1
2
3
4
5
6
7
# Plot learning curve (with costs)
costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

之前的理论课程中，我们已经提及过学习速率对于最终的结果有着较大影响，现在，我们来用实验让大家有一个直观的了解。


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
learning_rates = [0.01, 0.001, 0.0001]
models = {}
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')
for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))
plt.ylabel('cost')
plt.xlabel('iterations')
legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

分析：不同的学习速率会导致不同的预测结果。较小的学习速度收敛速度较慢，而过大的学习速度可能导致震荡或无法收敛。

如果你希望用一副你自己的图像，而不是训练集或测试集中的图像，那么该如何实现呢？


1
2
3
4
5
6
7
8
9
10
11
12
## START CODE HERE ## (PUT YOUR IMAGE NAME) 
my_image = "my_image.jpg"   # change this to the name of your image file 
## END CODE HERE ##
# We preprocess the image to fit your algorithm.
fname = "images/" + my_image
image = np.array(ndimage.imread(fname, flatten=False))  #读取图片
my_image = scipy.misc.imresize(image, size=(num_px,num_px)).reshape((1, num_px*num_px*3)).T  #放缩图像
my_predicted_image = predict(d["w"], d["b"], my_image)  #预测
plt.imshow(image)
print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") +  "\" picture.")

更多更详细的内容，请访问原创网站：

http://www.missshi.cn/api/view/blog/59aa08fee519f50d04000170

Ps：初次访问由于js文件较大，请耐心等候（8s左右）

阅读全文

0 0