Momentum&Adam

来源：互联网发布：网络远程教育西南大学编辑：程序博客网时间：2024/05/17 23:10

momentum
这里写图片描述

def update_parameters_with_momentum(parameters, grads, v, beta, learning_rate):    """    Update parameters using Momentum    Arguments:    parameters -- python dictionary containing your parameters:                    parameters['W' + str(l)] = Wl                    parameters['b' + str(l)] = bl    grads -- python dictionary containing your gradients for each parameters:                    grads['dW' + str(l)] = dWl                    grads['db' + str(l)] = dbl    v -- python dictionary containing the current velocity:                    v['dW' + str(l)] = ...                    v['db' + str(l)] = ...    beta -- the momentum hyperparameter, scalar    learning_rate -- the learning rate, scalar    Returns:    parameters -- python dictionary containing your updated parameters     v -- python dictionary containing your updated velocities    """    L = len(parameters) // 2 # number of layers in the neural networks    # Momentum update for each parameter    for l in range(L):        ### START CODE HERE ### (approx. 4 lines)        # compute velocities        v["dW" + str(l+1)] = beta*v["dW" + str(l+1)] + (1-beta)*grads["dW" + str(l+1)]        v["db" + str(l+1)] = beta*v["db" + str(l+1)] + (1-beta)*grads["db" + str(l+1)]        # update parameters        parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate*v["dW" + str(l+1)]        parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate*v["db" + str(l+1)]        ### END CODE HERE ###    return parameters, v

adam
这里写图片描述

def update_parameters_with_adam(parameters, grads, v, s, t, learning_rate = 0.01,                                beta1 = 0.9, beta2 = 0.999,  epsilon = 1e-8):    """    Update parameters using Adam    Arguments:    parameters -- python dictionary containing your parameters:                    parameters['W' + str(l)] = Wl                    parameters['b' + str(l)] = bl    grads -- python dictionary containing your gradients for each parameters:                    grads['dW' + str(l)] = dWl                    grads['db' + str(l)] = dbl    v -- Adam variable, moving average of the first gradient, python dictionary    s -- Adam variable, moving average of the squared gradient, python dictionary    learning_rate -- the learning rate, scalar.    beta1 -- Exponential decay hyperparameter for the first moment estimates     beta2 -- Exponential decay hyperparameter for the second moment estimates     epsilon -- hyperparameter preventing division by zero in Adam updates    Returns:    parameters -- python dictionary containing your updated parameters     v -- Adam variable, moving average of the first gradient, python dictionary    s -- Adam variable, moving average of the squared gradient, python dictionary    """    L = len(parameters) // 2                 # number of layers in the neural networks    v_corrected = {}                         # Initializing first moment estimate, python dictionary    s_corrected = {}                         # Initializing second moment estimate, python dictionary    # Perform Adam update on all parameters    for l in range(L):        # Moving average of the gradients. Inputs: "v, grads, beta1". Output: "v".        ### START CODE HERE ### (approx. 2 lines)        v["dW" + str(l+1)] = beta1*v["dW" + str(l+1)]+(1-beta1)*grads["dW" + str(l+1)]        v["db" + str(l+1)] = beta1*v["db" + str(l+1)]+(1-beta1)*grads["db" + str(l+1)]        ### END CODE HERE ###        # Compute bias-corrected first moment estimate. Inputs: "v, beta1, t". Output: "v_corrected".        ### START CODE HERE ### (approx. 2 lines)        v_corrected["dW" + str(l+1)] = v["dW" + str(l+1)]/(1-beta1**t)        v_corrected["db" + str(l+1)] = v["db" + str(l+1)]/(1-beta1**t)        ### END CODE HERE ###        # Moving average of the squared gradients. Inputs: "s, grads, beta2". Output: "s".        ### START CODE HERE ### (approx. 2 lines)        s["dW" + str(l+1)] = beta2*s["dW" + str(l+1)]+(1-beta2)*(grads["dW" + str(l+1)]**2)        s["db" + str(l+1)] = beta2*s["db" + str(l+1)]+(1-beta2)*(grads["db" + str(l+1)]**2)        ### END CODE HERE ###        # Compute bias-corrected second raw moment estimate. Inputs: "s, beta2, t". Output: "s_corrected".        ### START CODE HERE ### (approx. 2 lines)        s_corrected["dW" + str(l+1)] = s["dW" + str(l+1)]/(1-beta2**t)        s_corrected["db" + str(l+1)] = s["db" + str(l+1)]/(1-beta2**t)        ### END CODE HERE ###        # Update parameters. Inputs: "parameters, learning_rate, v_corrected, s_corrected, epsilon". Output: "parameters".        ### START CODE HERE ### (approx. 2 lines)        parameters["W" + str(l+1)] = parameters["W" + str(l+1)]-learning_rate*(v_corrected["dW" + str(l+1)]/(np.sqrt(s_corrected["dW" + str(l+1)])+epsilon))        parameters["b" + str(l+1)] = parameters["b" + str(l+1)]-learning_rate*(v_corrected["db" + str(l+1)]/(np.sqrt(s_corrected["db" + str(l+1)])+epsilon))        ### END CODE HERE ###    return parameters, v, s

阅读全文

0 0