deeplearning _Initialization
来源:互联网 发布:linux rsyslog配置 编辑:程序博客网 时间:2024/04/30 22:47
Initialization
这次测试主要是对几种初始化权重的方式作了介绍
导入库
import numpy as npimport matplotlib.pyplot as pltimport sklearnimport sklearn.datasetsfrom init_utils import sigmoid, relu, compute_loss, forward_propagation, backward_propagationfrom init_utils import update_parameters, predict, load_dataset, plot_decision_boundary, predict_decplt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plotsplt.rcParams['image.interpolation'] = 'nearest'plt.rcParams['image.cmap'] = 'gray'# load image dataset: blue/red dots in circlestrain_X, train_Y, test_X, test_Y = load_dataset()
模型
def model(X, Y, learning_rate = 0.01, num_iterations = 15000, print_cost = True, initialization = "he"): """ Implements a three-layer neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SIGMOID. Arguments: X -- input data, of shape (2, number of examples) Y -- true "label" vector (containing 0 for red dots; 1 for blue dots), of shape (1, number of examples) learning_rate -- learning rate for gradient descent num_iterations -- number of iterations to run gradient descent print_cost -- if True, print the cost every 1000 iterations initialization -- flag to choose which initialization to use ("zeros","random" or "he") Returns: parameters -- parameters learnt by the model """ grads = {} costs = [] # to keep track of the loss m = X.shape[1] # number of examples layers_dims = [X.shape[0], 10, 5, 1] # Initialize parameters dictionary. if initialization == "zeros": parameters = initialize_parameters_zeros(layers_dims) elif initialization == "random": parameters = initialize_parameters_random(layers_dims) elif initialization == "he": parameters = initialize_parameters_he(layers_dims) # Loop (gradient descent) for i in range(0, num_iterations): # Forward propagation: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID. a3, cache = forward_propagation(X, parameters) # Loss cost = compute_loss(a3, Y) # Backward propagation. grads = backward_propagation(X, Y, cache) # Update parameters. parameters = update_parameters(parameters, grads, learning_rate) # Print the loss every 1000 iterations if print_cost and i % 1000 == 0: print("Cost after iteration {}: {}".format(i, cost)) costs.append(cost) # plot the loss plt.plot(costs) plt.ylabel('cost') plt.xlabel('iterations (per hundreds)') plt.title("Learning rate =" + str(learning_rate)) plt.show() return parameters
0初始化
There are two types of parameters to initialize in a neural network:
- the weight matrices
- the bias vectors
Exercise: Implement the following function to initialize all parameters to zeros. You’ll see later that this does not work well since it fails to “break symmetry”, but lets try it anyway and see what happens. Use np.zeros((..,..)) with the correct shapes.
# GRADED FUNCTION: initialize_parameters_zeros def initialize_parameters_zeros(layers_dims): """ Arguments: layer_dims -- python array (list) containing the size of each layer. Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": W1 -- weight matrix of shape (layers_dims[1], layers_dims[0]) b1 -- bias vector of shape (layers_dims[1], 1) ... WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1]) bL -- bias vector of shape (layers_dims[L], 1) """ parameters = {} L = len(layers_dims) # number of layers in the network for l in range(1, L): ### START CODE HERE ### (≈ 2 lines of code) parameters['W' + str(l)] = np.zeros((layers_dims[l],layers_dims[l-1])) parameters['b' + str(l)] = np.zeros((layers_dims[l],1)) ### END CODE HERE ### return parameters
0初始化结果
parameters = model(train_X, train_Y, initialization = "zeros")print ("On the train set:")predictions_train = predict(train_X, train_Y, parameters)print ("On the test set:")predictions_test = predict(test_X, test_Y, parameters)
On the train set:
Accuracy: 0.5
On the test set:
Accuracy: 0.5
Random initialization
# GRADED FUNCTION: initialize_parameters_randomdef initialize_parameters_random(layers_dims): """ Arguments: layer_dims -- python array (list) containing the size of each layer. Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": W1 -- weight matrix of shape (layers_dims[1], layers_dims[0]) b1 -- bias vector of shape (layers_dims[1], 1) ... WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1]) bL -- bias vector of shape (layers_dims[L], 1) """ np.random.seed(3) # This seed makes sure your "random" numbers will be the as ours parameters = {} L = len(layers_dims) # integer representing the number of layers for l in range(1, L): ### START CODE HERE ### (≈ 2 lines of code) parameters['W' + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1])*10 parameters['b' + str(l)] = np.zeros((layers_dims[l],1)) ### END CODE HERE ### return parameters
随机初始化结果:
parameters = model(train_X, train_Y, initialization = "random")print ("On the train set:")predictions_train = predict(train_X, train_Y, parameters)print ("On the test set:")predictions_test = predict(test_X, test_Y, parameters)
On the train set:
Accuracy: 0.83
On the test set:
Accuracy: 0.86
He initialization
Finally, try “He Initialization”; this is named for the first author of He et al., 2015. (If you have heard of “Xavier initialization”, this is similar except Xavier initialization uses a scaling factor for the weights sqrt(1./layers_dims[l-1])
where He initialization would use sqrt(2./layers_dims[l-1])
.)
Exercise: Implement the following function to initialize your parameters with He initialization.
Hint: This function is similar to the previous initialize_parameters_random(...)
. The only difference is that instead of multiplying np.random.randn(..,..)
by 10, you will multiply it by
# GRADED FUNCTION: initialize_parameters_hedef initialize_parameters_he(layers_dims): """ Arguments: layer_dims -- python array (list) containing the size of each layer. Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": W1 -- weight matrix of shape (layers_dims[1], layers_dims[0]) b1 -- bias vector of shape (layers_dims[1], 1) ... WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1]) bL -- bias vector of shape (layers_dims[L], 1) """ np.random.seed(3) parameters = {} L = len(layers_dims) - 1 # integer representing the number of layers for l in range(1, L + 1): ### START CODE HERE ### (≈ 2 lines of code) parameters['W' + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1])*np.sqrt(2./layers_dims[l-1]) parameters['b' + str(l)] = np.zeros((layers_dims[l],1)) ### END CODE HERE ### return parameters
He初始化结果:
parameters = model(train_X, train_Y, initialization = "he")
print ("On the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print ("On the test set:")
predictions_test = predict(test_X, test_Y, parameters)
On the train set:
Accuracy: 0.993333333333
On the test set:
Accuracy: 0.96
Conclusions
You have seen three different types of initializations. For the same number of iterations and same hyperparameters the comparison is:
**Model** **Train accuracy** **Problem/Comment** 3-layer NN with zeros initialization 50% fails to break symmetry 3-layer NN with large random initialization 83% too large weights 3-layer NN with He initialization 99% recommended method- deeplearning _Initialization
- deeplearning
- deeplearning
- deeplearning
- Swift2-0基础_Initialization(构造过程)
- Deeplearning 教程
- DeepLearning Overview
- DeepLearning简介
- DeepLearning tutorial
- deeplearning-tricks
- DeepLearning Book
- deeplearning URL
- [DeepLearning]papers
- DeepLearning概论
- deeplearning cbir
- DeepLearning 笔记
- 初学DeepLearning
- deeplearning简介
- VC,CString,UTF8与GBK互转
- 第八周项目3对称矩阵的压缩存储
- git的使用
- c++栈
- Linux系统虚拟机管理(创建,安装,快照,删除)
- deeplearning _Initialization
- 使用STL map应该注意什么
- 掌柜大作战(2):京东Redis服务的使用
- 网页静态服务器-1-显示固定的页面
- OBIEE Quick LDAP Configuration
- 欢迎使用 MWeb
- 设计模式之单例模式
- Tensorflow报错tensorflow.python.framework.errors_impl.InvalidArgumentError exception str() failed原因
- Android 判断网络是否正常