[吴恩达 DL]Class1 Week2 神经网络基础 + 逻辑回归代码实现
来源:互联网 发布:html5 modernizer.js 编辑:程序博客网 时间:2024/04/20 06:08
本周的内容主要围绕逻辑回归二分类问题展开,针对逻辑回归的定义,损失函数,梯度下降优化,向量化等知识点进行讲解。分课程笔记及代码实现两部分进行讲解。
一 课程笔记
1.结构
即给定x,求
其中,
可以看出
- 当z为很大的正值时,σ(z)趋近于1
- 当z为很大的负值时,σ(z)趋近于0
2. 损失函数
2.1 Loss function
对于一个样本来说,其loss function为:
其中,
上面的公式可以这样理解:
- If y=1,
L(y^,y(i))=−log(y^) , 这时希望y^ 尽可能大(趋近1) - If y=0,
L(y^,y(i))=−log(1−y^) ,这时希望y^ 尽可能小(趋近0)
2.2 Cost function
Cost function实际为所有样本Loss function之和:
在实际训练过程中,我们希望Cost function尽可能小。
3. 逻辑回归中的梯度下降
逻辑回归的梯度下降可以这样理解:
- 首先初始化逻辑回归模型中的参数(w,b)
- 利用
y^=σ(z)=σ(wTx+b) 求出预测值,利用cost function计算损失,求出梯度dw,db - 更新参数:w=w-αdw;b=b-αdb
- 重复2,3步直至寻找到合适的w,b
二 代码实现
整个过程计算过程如下:
- 导入train_set和test_set,了解数据格式(如train_set_x(m,num_px,num_px,3))
- 对数据进行预处理:
(1) 将图片数据展开(num_px,num_px,3) ———> (num_px*num_px*3,1)
(2) 标准化数据集
train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255. - 初始化参数(w,b)
- 前向传播,计算cost,反向传播
(1)前向传播:A=σ(wTX+b)=(a(0),a(1),...,a(m−1),a(m))
(2)计算Cost:J=−1m∑mi=1y(i)log(a(i))+(1−y(i))log(1−a(i))
(3)反向传播梯度计算:dw=∂J∂w=1mX(A−Y)T db=∂J∂b=1m∑i=1m(a(i)−y(i)) - 更新参数:w=w-αdw;b=b-αdb
- 预测:
输入(X, w, b)
输出(Y_prediction) - 整合模型:
输入(X_train, Y_train,X_test, Y_test, w, b, num_iterations, learning_rate, print_cost)
输出(多个参数组合而成的字典)
以下为具体代码:
1.import packages
import numpy as npimport matplotlib.pyplot as pltimport h5pyimport scipyfrom PIL import Imagefrom scipy import ndimagefrom lr_utils import load_dataset%matplotlib inline
2.导入数据
def load_dataset(): train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r") train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r") test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels classes = np.array(test_dataset["list_classes"][:]) # the list of classes train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0])) test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0])) return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
# Loading the data (cat/non-cat)train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()# train_set_x ---------- (209,64,64,3) 209张图片# train_set_y ---------- (1,209)# test_set_x ---------- (50,64,64,3) 50张图片# test_set_y ---------- (1,50)# Example of a pictureindex = 25plt.imshow(train_set_x_orig[index]) # 显示第(index+1)张图片print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") + "' picture.")# np.squeeze 将[1]变为1
3.对数据进行预处理
# 提取样本个数,相片宽度/高度m_train = train_set_x_orig.shape[0]m_test = test_set_x_orig.shape[0]num_px = train_set_x_orig.shape[1]# Reshape the training and test examplestrain_set_x_flatten = train_set_x_orig.reshape(m_train, num_px * num_px * 3).Ttest_set_x_flatten = test_set_x_orig.reshape(m_test, num_px * num_px * 3).T# Standardize our datasettrain_set_x = train_set_x_flatten/255.test_set_x = test_set_x_flatten/255.
4.初始化参数(初始化为0)
def initialize_with_zeros(dim): """ This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0. Argument: dim -- size of the w vector we want (or number of parameters in this case) Returns: w -- initialized vector of shape (dim, 1) b -- initialized scalar (corresponds to the bias) """ ### START CODE HERE ### (≈ 1 line of code) w = np.zeros((dim,1)) b = 0 ### END CODE HERE ### #声明矩阵维度 assert(w.shape == (dim, 1)) assert(isinstance(b, float) or isinstance(b, int)) return w, b
5.Propagate:前向,Cost,反向
def propagate(w, b, X, Y): """ Implement the cost function and its gradient for the propagation explained above Arguments: w -- weights, a numpy array of size (num_px * num_px * 3, 1) b -- bias, a scalar X -- data of size (num_px * num_px * 3, number of examples) Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples) Return: cost -- negative log-likelihood cost for logistic regression dw -- gradient of the loss with respect to w, thus same shape as w db -- gradient of the loss with respect to b, thus same shape as b Tips: - Write your code step by step for the propagation. np.log(), np.dot() """ m = X.shape[1] # FORWARD PROPAGATION (FROM X TO COST) ### START CODE HERE ### (≈ 2 lines of code) A = sigmoid(np.dot(w.T, X) + b) # compute activation cost = -1./m * np.sum(Y * np.log(A) + (1-Y) * np.log(1-A)) # compute cost ### END CODE HERE ### # BACKWARD PROPAGATION (TO FIND GRAD) ### START CODE HERE ### (≈ 2 lines of code) dw = 1./m * np.dot(X ,(A-Y).T) db = 1./m * np.sum(A-Y, axis=1, keepdims= True) ### END CODE HERE ### assert(dw.shape == w.shape) assert(db.dtype == float) cost = np.squeeze(cost) assert(cost.shape == ()) grads = {"dw": dw, "db": db} return grads, cost
6.Optimize:利用dw,db更新参数
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False): """ This function optimizes w and b by running a gradient descent algorithm Arguments: w -- weights, a numpy array of size (num_px * num_px * 3, 1) b -- bias, a scalar X -- data of shape (num_px * num_px * 3, number of examples) Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples) num_iterations -- number of iterations of the optimization loop learning_rate -- learning rate of the gradient descent update rule print_cost -- True to print the loss every 100 steps Returns: params -- dictionary containing the weights w and bias b grads -- dictionary containing the gradients of the weights and bias with respect to the cost function costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve. Tips: You basically need to write down two steps and iterate through them: 1) Calculate the cost and the gradient for the current parameters. Use propagate(). 2) Update the parameters using gradient descent rule for w and b. """ costs = [] for i in range(num_iterations): # Cost and gradient calculation (≈ 1-4 lines of code) ### START CODE HERE ### grads, cost = propagate(w, b, X, Y) ### END CODE HERE ### # Retrieve derivatives from grads dw = grads["dw"] db = grads["db"] # update rule (≈ 2 lines of code) ### START CODE HERE ### w = w - learning_rate * dw b = b - learning_rate * db ### END CODE HERE ### # Record the costs if i % 100 == 0: costs.append(cost) # Print the cost every 100 training examples if print_cost and i % 100 == 0: print ("Cost after iteration %i: %f" %(i, cost)) params = {"w": w, "b": b} grads = {"dw": dw, "db": db} return params, grads, costs
7.predict:根据输入x及优化所得的w,b ——->y_prediction
def predict(w, b, X): ''' Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b) Arguments: w -- weights, a numpy array of size (num_px * num_px * 3, 1) b -- bias, a scalar X -- data of size (num_px * num_px * 3, number of examples) Returns: Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X ''' m = X.shape[1] Y_prediction = np.zeros((1,m)) w = w.reshape(X.shape[0], 1) # 为什么还要在加一次声明 # Compute vector "A" predicting the probabilities of a cat being present in the picture ### START CODE HERE ### (≈ 1 line of code) A = sigmoid(np.dot(w.T, X) + b) # (1, m) ### END CODE HERE ### for i in range(A.shape[1]): # Convert probabilities A[0,i] to actual predictions p[0,i] ### START CODE HERE ### (≈ 4 lines of code) Y_prediction = np.floor(A + 0.5) ### END CODE HERE ### assert(Y_prediction.shape == (1, m)) return Y_prediction
8.Merge all to a model()
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False): """ Builds the logistic regression model by calling the function you've implemented previously Arguments: X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train) Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train) X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test) Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test) num_iterations -- hyperparameter representing the number of iterations to optimize the parameters learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize() print_cost -- Set to true to print the cost every 100 iterations Returns: d -- dictionary containing information about the model. """ ### START CODE HERE ### # initialize parameters with zeros (≈ 1 line of code) w, b = initialize_with_zeros(X_train.shape[0]) # Gradient descent (≈ 1 line of code) parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost) # Retrieve parameters w and b from dictionary "parameters" w = parameters["w"] b = parameters["b"] # Predict test/train set examples (≈ 2 lines of code) Y_prediction_test = predict(w, b, X_test) Y_prediction_train = predict(w, b, X_train) ### END CODE HERE ### # Print train/test Errors print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100)) print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100)) d = {"costs": costs, "Y_prediction_test": Y_prediction_test, "Y_prediction_train" : Y_prediction_train, "w" : w, "b" : b, "learning_rate" : learning_rate, "num_iterations": num_iterations} return d
因此,当你获得一个训练集及测试集时,只需执行
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)
而要对一个新图片进行预测时,则可执行
## START CODE HERE ## (PUT YOUR IMAGE NAME) my_image = "my_image.jpg" # change this to the name of your image file ## END CODE HERE ### We preprocess the image to fit your algorithm.fname = "images/" + my_image # 路径 + 文件名image = np.array(ndimage.imread(fname, flatten=False))my_image = scipy.misc.imresize(image, size=(num_px,num_px)).reshape((1, num_px*num_px*3)).Tmy_predicted_image = predict(d["w"], d["b"], my_image)plt.imshow(image)print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") + "\" picture.")
9.画出learning curve(cost 随 iteration 的变化)
# Plot learning curve (with costs)costs = np.squeeze(d['costs'])plt.plot(costs)plt.ylabel('cost')plt.xlabel('iterations (per hundreds)')plt.title("Learning rate =" + str(d["learning_rate"]))plt.show()
阅读全文
0 0
- [吴恩达 DL]Class1 Week2 神经网络基础 + 逻辑回归代码实现
- [吴恩达 DL] Class1 Week3 浅层神经网络+代码实现
- [吴恩达 DL] Class1 Week4 深层神经网络+代码实现
- [吴恩达 DL] CLass2 Week2 Mini-batch梯度下降 课程总结+代码实现
- Coursera吴恩达《神经网络与深度学习》课程笔记(2)-- 神经网络基础之逻辑回归
- 神经网络-逻辑回归
- 逻辑回归softmax神经网络实现手写数字识别(cs)
- 逻辑回归原理(python代码实现)
- [DL]机器学习算法之逻辑回归
- 神经网络,逻辑回归,矩阵求导
- Coursera_DeepLearning_神经网络之逻辑回归
- 使用神经网络进行逻辑回归
- 神经网络学习:逻辑回归与 SoftMax 回归
- 逻辑回归Python代码
- 代码,逻辑回归(logistic_regression)实现mnist分类(TensorFlow实现)
- 逻辑回归--Octave实现
- python实现逻辑回归
- 逻辑回归 python 实现
- uml
- Ubuntu下如何安装TensorFlow
- 杂记
- hive的学习_优化
- Linux平台卸载MySQL总结
- [吴恩达 DL]Class1 Week2 神经网络基础 + 逻辑回归代码实现
- pair排序 线段覆盖 贪心
- MacOS 连接Cisco Console
- leanclude 数组查询问题
- C#中字符串的常用属性和方法
- Go语言并发编程总结
- 【Vijos1083】小白逛公园(线段树)
- 【学术篇】SDOI2010 古代猪文
- 深秋---JAVA 泛型