《深度学习——Andrew Ng》第二课第一周编程作业
来源:互联网 发布:mysql官网下载旧版本 编辑:程序博客网 时间:2024/06/05 17:49
Initialization
作业通过三种不同的初始化参数的方式(zero、random、he),对神经网络进行参数初始化,通过对比,得出每种初始化方式的特征。最后结论为he初始化是最好的方式。
程序
原始数据集:
import numpy as npimport matplotlib.pyplot as pltimport sklearnimport sklearn.datasetsfrom init_utils import sigmoid, relu, compute_loss, forward_propagation, backward_propagationfrom init_utils import update_parameters, predict, load_dataset, plot_decision_boundary, predict_dec#%matplotlib inlineplt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plotsplt.rcParams['image.interpolation'] = 'nearest'plt.rcParams['image.cmap'] = 'gray'# load image dataset: blue/red dots in circlestrain_X, train_Y, test_X, test_Y = load_dataset()def model(X, Y, learning_rate=0.01, num_iterations=15000, print_cost=True, initialization="he"): """ Implements a three-layer neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SIGMOID. Arguments: X -- input data, of shape (2, number of examples) Y -- true "label" vector (containing 0 for red dots; 1 for blue dots), of shape (1, number of examples) learning_rate -- learning rate for gradient descent num_iterations -- number of iterations to run gradient descent print_cost -- if True, print the cost every 1000 iterations initialization -- flag to choose which initialization to use ("zeros","random" or "he") Returns: parameters -- parameters learnt by the model """ grads = {} costs = [] # to keep track of the loss m = X.shape[1] # number of examples layers_dims = [X.shape[0], 10, 5, 1] # Initialize parameters dictionary. if initialization == "zeros": parameters = initialize_parameters_zeros(layers_dims) elif initialization == "random": parameters = initialize_parameters_random(layers_dims) elif initialization == "he": parameters = initialize_parameters_he(layers_dims) # Loop (gradient descent) for i in range(0, num_iterations): # Forward propagation: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID. a3, cache = forward_propagation(X, parameters) # Loss cost = compute_loss(a3, Y) # Backward propagation. grads = backward_propagation(X, Y, cache) # Update parameters. parameters = update_parameters(parameters, grads, learning_rate) # Print the loss every 1000 iterations if print_cost and i % 1000 == 0: print("Cost after iteration {}: {}".format(i, cost)) costs.append(cost) # plot the loss plt.plot(costs) plt.ylabel('cost') plt.xlabel('iterations (per hundreds)') plt.title("Learning rate =" + str(learning_rate)) plt.show() return parameters# GRADED FUNCTION: initialize_parameters_zerosdef initialize_parameters_zeros(layers_dims): """ Arguments: layer_dims -- python array (list) containing the size of each layer. Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": W1 -- weight matrix of shape (layers_dims[1], layers_dims[0]) b1 -- bias vector of shape (layers_dims[1], 1) ... WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1]) bL -- bias vector of shape (layers_dims[L], 1) """ parameters = {} L = len(layers_dims) # number of layers in the network for l in range(1, L): ### START CODE HERE ### (≈ 2 lines of code) parameters['W' + str(l)] = np.zeros((layers_dims[l], layers_dims[l - 1])) parameters['b' + str(l)] = np.zeros((layers_dims[l], 1)) ### END CODE HERE ### return parametersprint("******************** zero initialization ****************")parameters = model(train_X, train_Y, initialization = "zeros")print ("On the train set:")predictions_train = predict(train_X, train_Y, parameters)print ("On the test set:")predictions_test = predict(test_X, test_Y, parameters)# GRADED FUNCTION: initialize_parameters_randomdef initialize_parameters_random(layers_dims): """ Arguments: layer_dims -- python array (list) containing the size of each layer. Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": W1 -- weight matrix of shape (layers_dims[1], layers_dims[0]) b1 -- bias vector of shape (layers_dims[1], 1) ... WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1]) bL -- bias vector of shape (layers_dims[L], 1) """ np.random.seed(3) # This seed makes sure your "random" numbers will be the as ours parameters = {} L = len(layers_dims) # integer representing the number of layers for l in range(1, L): ### START CODE HERE ### (≈ 2 lines of code) parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1]) * 10 parameters['b' + str(l)] = np.zeros((layers_dims[l], 1)) ### END CODE HERE ### return parametersprint("******************** random initialization ****************")parameters = model(train_X, train_Y, initialization = "random")print ("On the train set:")predictions_train = predict(train_X, train_Y, parameters)print ("On the test set:")predictions_test = predict(test_X, test_Y, parameters)# GRADED FUNCTION: initialize_parameters_hedef initialize_parameters_he(layers_dims): """ Arguments: layer_dims -- python array (list) containing the size of each layer. Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": W1 -- weight matrix of shape (layers_dims[1], layers_dims[0]) b1 -- bias vector of shape (layers_dims[1], 1) ... WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1]) bL -- bias vector of shape (layers_dims[L], 1) """ np.random.seed(3) parameters = {} L = len(layers_dims) - 1 # integer representing the number of layers for l in range(1, L + 1): ### START CODE HERE ### (≈ 2 lines of code) parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l - 1]) * np.sqrt( 2. / layers_dims[l - 1]) parameters['b' + str(l)] = np.zeros((layers_dims[l], 1)) ### END CODE HERE ### return parametersprint("******************** He initialization ****************")parameters = model(train_X, train_Y, initialization = "he")print ("On the train set:")predictions_train = predict(train_X, train_Y, parameters)print ("On the test set:")predictions_test = predict(test_X, test_Y, parameters)
运行结果
********** zero initialization ******
Cost after iteration 0: 0.6931471805599453
Cost after iteration 1000: 0.6931471805599453
Cost after iteration 2000: 0.6931471805599453
Cost after iteration 3000: 0.6931471805599453
Cost after iteration 4000: 0.6931471805599453
Cost after iteration 5000: 0.6931471805599453
Cost after iteration 6000: 0.6931471805599453
Cost after iteration 7000: 0.6931471805599453
Cost after iteration 8000: 0.6931471805599453
Cost after iteration 9000: 0.6931471805599453
Cost after iteration 10000: 0.6931471805599455
Cost after iteration 11000: 0.6931471805599453
Cost after iteration 12000: 0.6931471805599453
Cost after iteration 13000: 0.6931471805599453
Cost after iteration 14000: 0.6931471805599453
On the train set:
Accuracy: 0.5
On the test set:
Accuracy: 0.5
********** random initialization ******
Cost after iteration 0: inf
Cost after iteration 1000: 0.6239560077799974
Cost after iteration 2000: 0.5981988756495555
Cost after iteration 3000: 0.5639165098349239
Cost after iteration 4000: 0.5501730606234159
Cost after iteration 5000: 0.5444478976702423
Cost after iteration 6000: 0.5374387172653514
Cost after iteration 7000: 0.47472803691077003
Cost after iteration 8000: 0.397783817035777
Cost after iteration 9000: 0.39347128330744535
Cost after iteration 10000: 0.39202801461972386
Cost after iteration 11000: 0.389225947340669
Cost after iteration 12000: 0.38615256867920933
Cost after iteration 13000: 0.3849845104125972
Cost after iteration 14000: 0.3827782795015039
On the train set:
Accuracy: 0.83
On the test set:
Accuracy: 0.86
********** He initialization ******
Cost after iteration 0: 0.8830537463419761
Cost after iteration 1000: 0.6879825919728063
Cost after iteration 2000: 0.6751286264523371
Cost after iteration 3000: 0.6526117768893807
Cost after iteration 4000: 0.6082958970572938
Cost after iteration 5000: 0.5304944491717495
Cost after iteration 6000: 0.4138645817071794
Cost after iteration 7000: 0.31178034648444414
Cost after iteration 8000: 0.23696215330322562
Cost after iteration 9000: 0.1859728720920683
Cost after iteration 10000: 0.1501555628037181
Cost after iteration 11000: 0.12325079292273544
Cost after iteration 12000: 0.09917746546525931
Cost after iteration 13000: 0.08457055954024274
Cost after iteration 14000: 0.07357895962677363
On the train set:
Accuracy: 0.993333333333
On the test set:
Accuracy: 0.96
Process finished with exit code 0
结论
You have seen three different types of initializations. For the same number of iterations and same hyperparameters the comparison is:
**Model** **Train accuracy** **Problem/Comment** 3-layer NN with zeros initialization 50% fails to break symmetry 3-layer NN with large random initialization 83% too large weights 3-layer NN with He initialization 99% recommended method
What you should remember from this notebook:
- Different initializations lead to different results
- Random initialization is used to break symmetry and make sure different hidden units can learn different things
- Don’t intialize to values that are too large
- He initialization works well for networks with ReLU activations.
- 《深度学习——Andrew Ng》第二课第一周编程作业
- 《深度学习——Andrew Ng》第二课第二周编程作业
- 《深度学习——Andrew Ng》第一课第二周编程作业
- 《深度学习——Andrew Ng》第一课第四周编程作业
- 《深度学习——Andrew Ng》第一课第三周编程作业
- Coursera—machine learning(Andrew Ng)第二周编程作业
- Coursera—machine learning(Andrew Ng)第四周编程作业
- Andrew Ng 深度学习课程deeplearning.ai 编程作业——shallow network for datesets classification (1-3)
- Andrew Ng 深度学习课程Deeplearning.ai 编程作业——forward and backward propagation(1-4.1)
- Andrew Ng 深度学习课程Deeplearning.ai 编程作业——deep Neural network for image classification(1-4.2)
- Andrew ng机器学习课程第一周
- 网易云深度学习第一课第一周编程作业
- Coursera—machine learning(Andrew Ng)第三周编程作业
- Coursera—machine learning(Andrew Ng)第五周编程作业
- Coursera—machine learning(Andrew Ng)第六周编程作业
- Coursera—machine learning(Andrew Ng)第七周编程作业
- Coursera—machine learning(Andrew Ng)第八周编程作业
- CS229 Andrew Ng 机器学习公开课作业1——监督学习
- 设计模式【模板模式Template Pattern】
- 恢复Excel批注框到默认位置
- Jacobian矩阵和Hessian矩阵
- linux的命令操作
- [OS] Shell脚本练习
- 《深度学习——Andrew Ng》第二课第一周编程作业
- LeetCode--Subsets II
- Pat 1009. 说反话 (20)
- 虚拟机下centos6.5系统redis集群安装
- CentOS 7.0 使用 yum 安装 MariaDB 与 MariaDB 的简单配置
- 我对C语言的印象,象雾象雨又象风。
- Hive小文件合并汇总
- Spring框架学习之高级依赖关系配置(一)
- 使用appendChild(),insertBefore()的一个小问题