deeplearning _Initialization

来源：互联网发布：linux rsyslog配置编辑：程序博客网时间：2024/04/30 22:47

Initialization

这次测试主要是对几种初始化权重的方式作了介绍

导入库

import numpy as npimport matplotlib.pyplot as pltimport sklearnimport sklearn.datasetsfrom init_utils import sigmoid, relu, compute_loss, forward_propagation, backward_propagationfrom init_utils import update_parameters, predict, load_dataset, plot_decision_boundary, predict_decplt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plotsplt.rcParams['image.interpolation'] = 'nearest'plt.rcParams['image.cmap'] = 'gray'# load image dataset: blue/red dots in circlestrain_X, train_Y, test_X, test_Y = load_dataset()

这里写图片描述

模型

def model(X, Y, learning_rate = 0.01, num_iterations = 15000, print_cost = True, initialization = "he"):    """    Implements a three-layer neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SIGMOID.    Arguments:    X -- input data, of shape (2, number of examples)    Y -- true "label" vector (containing 0 for red dots; 1 for blue dots), of shape (1, number of examples)    learning_rate -- learning rate for gradient descent     num_iterations -- number of iterations to run gradient descent    print_cost -- if True, print the cost every 1000 iterations    initialization -- flag to choose which initialization to use ("zeros","random" or "he")    Returns:    parameters -- parameters learnt by the model    """    grads = {}    costs = [] # to keep track of the loss    m = X.shape[1] # number of examples    layers_dims = [X.shape[0], 10, 5, 1]    # Initialize parameters dictionary.    if initialization == "zeros":        parameters = initialize_parameters_zeros(layers_dims)    elif initialization == "random":        parameters = initialize_parameters_random(layers_dims)    elif initialization == "he":        parameters = initialize_parameters_he(layers_dims)    # Loop (gradient descent)    for i in range(0, num_iterations):        # Forward propagation: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID.        a3, cache = forward_propagation(X, parameters)        # Loss        cost = compute_loss(a3, Y)        # Backward propagation.        grads = backward_propagation(X, Y, cache)        # Update parameters.        parameters = update_parameters(parameters, grads, learning_rate)        # Print the loss every 1000 iterations        if print_cost and i % 1000 == 0:            print("Cost after iteration {}: {}".format(i, cost))            costs.append(cost)    # plot the loss    plt.plot(costs)    plt.ylabel('cost')    plt.xlabel('iterations (per hundreds)')    plt.title("Learning rate =" + str(learning_rate))    plt.show()    return parameters

0初始化

There are two types of parameters to initialize in a neural network:
- the weight matrices (W[1],W[2],W[3],...,W[L−1],W[L])
- the bias vectors (b[1],b[2],b[3],...,b[L−1],b[L])

Exercise: Implement the following function to initialize all parameters to zeros. You’ll see later that this does not work well since it fails to “break symmetry”, but lets try it anyway and see what happens. Use np.zeros((..,..)) with the correct shapes.

# GRADED FUNCTION: initialize_parameters_zeros def initialize_parameters_zeros(layers_dims):    """    Arguments:    layer_dims -- python array (list) containing the size of each layer.    Returns:    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":                    W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])                    b1 -- bias vector of shape (layers_dims[1], 1)                    ...                    WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])                    bL -- bias vector of shape (layers_dims[L], 1)    """    parameters = {}    L = len(layers_dims)            # number of layers in the network    for l in range(1, L):        ### START CODE HERE ### (≈ 2 lines of code)        parameters['W' + str(l)] = np.zeros((layers_dims[l],layers_dims[l-1]))        parameters['b' + str(l)] = np.zeros((layers_dims[l],1))        ### END CODE HERE ###    return parameters

0初始化结果

parameters = model(train_X, train_Y, initialization = "zeros")print ("On the train set:")predictions_train = predict(train_X, train_Y, parameters)print ("On the test set:")predictions_test = predict(test_X, test_Y, parameters)

这里写图片描述
On the train set:
Accuracy: 0.5
On the test set:
Accuracy: 0.5

Random initialization

# GRADED FUNCTION: initialize_parameters_randomdef initialize_parameters_random(layers_dims):    """    Arguments:    layer_dims -- python array (list) containing the size of each layer.    Returns:    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":                    W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])                    b1 -- bias vector of shape (layers_dims[1], 1)                    ...                    WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])                    bL -- bias vector of shape (layers_dims[L], 1)    """    np.random.seed(3)               # This seed makes sure your "random" numbers will be the as ours    parameters = {}    L = len(layers_dims)            # integer representing the number of layers    for l in range(1, L):       ### START CODE HERE ### (≈ 2 lines of code)        parameters['W' + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1])*10        parameters['b' + str(l)] = np.zeros((layers_dims[l],1))        ### END CODE HERE ###    return parameters

随机初始化结果：

parameters = model(train_X, train_Y, initialization = "random")print ("On the train set:")predictions_train = predict(train_X, train_Y, parameters)print ("On the test set:")predictions_test = predict(test_X, test_Y, parameters)

这里写图片描述
On the train set:
Accuracy: 0.83
On the test set:
Accuracy: 0.86

He initialization

Finally, try “He Initialization”; this is named for the first author of He et al., 2015. (If you have heard of “Xavier initialization”, this is similar except Xavier initialization uses a scaling factor for the weights W[l] of sqrt(1./layers_dims[l-1]) where He initialization would use sqrt(2./layers_dims[l-1]).)

Exercise: Implement the following function to initialize your parameters with He initialization.

Hint: This function is similar to the previous initialize_parameters_random(...). The only difference is that instead of multiplying np.random.randn(..,..) by 10, you will multiply it by 2dimension of the previous layer−−−−−−−−−−−−−−−−−−√, which is what He initialization recommends for layers with a ReLU activation.

# GRADED FUNCTION: initialize_parameters_hedef initialize_parameters_he(layers_dims):    """    Arguments:    layer_dims -- python array (list) containing the size of each layer.    Returns:    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":                    W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])                    b1 -- bias vector of shape (layers_dims[1], 1)                    ...                    WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])                    bL -- bias vector of shape (layers_dims[L], 1)    """    np.random.seed(3)    parameters = {}    L = len(layers_dims) - 1 # integer representing the number of layers    for l in range(1, L + 1):        ### START CODE HERE ### (≈ 2 lines of code)        parameters['W' + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1])*np.sqrt(2./layers_dims[l-1])        parameters['b' + str(l)] = np.zeros((layers_dims[l],1))        ### END CODE HERE ###    return parameters

He初始化结果：

parameters = model(train_X, train_Y, initialization = "he") print ("On the train set:") predictions_train = predict(train_X, train_Y, parameters) print ("On the test set:") predictions_test = predict(test_X, test_Y, parameters) 这里写图片描述
On the train set:
Accuracy: 0.993333333333
On the test set:
Accuracy: 0.96

Conclusions

You have seen three different types of initializations. For the same number of iterations and same hyperparameters the comparison is:

**Model** **Train accuracy** **Problem/Comment** 3-layer NN with zeros initialization 50% fails to break symmetry 3-layer NN with large random initialization 83% too large weights 3-layer NN with He initialization 99% recommended method

阅读全文

0 0