cetrolchen 2018-10-07
在构建神经网络时,需要采取几个步骤。最重要的两个步骤是实现正向和反向传播。在本教程中,我们将重点关注反向传播及其每一步背后的直觉。
这是一种简单的实现神经网络的技术,它允许我们计算参数的梯度,以执行梯度下降和最小化我们的成本函数。我们将在本教程中完全理解反向传播的每个部分。
假设一个简单的两层神经网络 - 一个隐藏层和一个输出层。我们可以如下进行反向传播
初始化用于神经网络的权重和偏差:这涉及随机初始化神经网络的权重和偏差。这些参数的梯度将从反向传播中获得并用于更新梯度下降。
Python代码如下:
#Import Numpy library import numpy as np #set seed for reproducability np.random.seed(100) #We will first initialize the weights and bias needed and store them in a dictionary called W_B def initialize(num_f, num_h, num_out): ''' Description: This function randomly initializes the weights and biases of each layer of the neural network Input Arguments: num_f - number of training features num_h -the number of nodes in the hidden layers num_out - the number of nodes in the output Output: W_B - A dictionary of the initialized parameters. ''' #randomly initialize weights and biases, and proceed to store in a dictionary W_B = { 'W1': np.random.randn(num_h, num_f), 'b1': np.zeros((num_h, 1)), 'W2': np.random.randn(num_out, num_h), 'b2': np.zeros((num_out, 1)) } return W_B
执行正向传播:这涉及计算隐藏层和输出层的线性和激活输出。
对于隐藏层:
我们将使用relu激活函数,Python代码如下所示:
#We will now proceed to create functions for each of our activation functions def relu (Z): ''' Description: This function performs the relu activation function on a given number or matrix. Input Arguments: Z - matrix or integer Output: relu_Z - matrix or integer with relu performed on it ''' relu_Z = np.maximum(Z,0) return relu_Z
对于输出层:
我们将使用sigmoid激活函数,Python实现如下所示:
def sigmoid (Z): ''' Description: This function performs the sigmoid activation function on a given number or matrix. Input Arguments: Z - matrix or integer Output: sigmoid_Z - matrix or integer with sigmoid performed on it ''' sigmoid_Z = 1 / (1 + (np.exp(-Z))) return sigmoid_Z
执行正向传播,Python实现如下:
#We will now proceed to perform forward propagation def forward_propagation(X, W_B): ''' Description: This function performs the forward propagation in a vectorized form Input Arguments: X - input training examples W_B - initialized weights and biases Output: forward_results - A dictionary containing the linear and activation outputs ''' #Calculate the linear Z for the hidden layer Z1 = np.dot(X, W_B['W1'].T) + W_B['b1'] #Calculate the activation ouput for the hidden layer A = relu(Z1) #Calculate the linear Z for the output layer Z2 = np.dot(A, W_B['W2'].T) + W_B['b2'] #Calculate the activation ouput for the ouptu layer Y_pred = sigmoid(Z2) #Save all ina dictionary forward_results = {"Z1": Z1, "A": A, "Z2": Z2, "Y_pred": Y_pred} return forward_results
执行反向传播:
计算成本相对于梯度下降相关参数的梯度。在本例中,dLdZ2、dLdW2、dLdb2、dLdZ1、dLdW1和dLdb1。这些参数将与学习率结合起来进行梯度下降。
逐步指南如下:
forward_results = forward_propagation(X, W_B) Z1 = forward_results['Z1'] A = forward_results['A'] Z2 = forward_results['Z2'] Y_pred = forward_results['Y_pred']
no_examples = X.shape[1]
L = (1/no_examples) * np.sum(-Y_true * np.log(Y_pred) - (1 - Y_true) * np.log(1 - Y_pred))
dLdZ2= Y_pred - Y_true dLdW2 = (1/no_examples) * np.dot(dLdZ2, A.T) dLdb2 = (1/no_examples) * np.sum(dLdZ2, axis=1, keepdims=True) dLdZ1 = np.multiply(np.dot(W_B['W2'].T, dLdZ2), (1 - np.power(A, 2))) dLdW1 = (1/no_examples) * np.dot(dLdZ1, X.T) dLdb1 = (1/no_examples) * np.sum(dLdZ1, axis=1, keepdims=True)
gradients = {"dLdW1": dLdW1, "dLdb1": dLdb1, "dLdW2": dLdW2, "dLdb2": dLdb2}
return gradients, L
Python代码如下:
def backward_propagation(X, W_B, Y_true): '''Description: This function performs the backward propagation in a vectorized form Input Arguments: X - input training examples W_B - initialized weights and biases Y_True - the true target values of the training examples Output: gradients - the calculated gradients of each parameter L - the loss function ''' # Obtain the forward results from the forward propagation forward_results = forward_propagation(X, W_B) Z1 = forward_results['Z1'] A = forward_results['A'] Z2 = forward_results['Z2'] Y_pred = forward_results['Y_pred'] #Obtain the number of training samples no_examples = X.shape[1] # Calculate loss L = (1/no_examples) * np.sum(-Y_true * np.log(Y_pred) - (1 - Y_true) * np.log(1 - Y_pred)) #Calculate the gradients of each parameter needed for gradient descent dLdZ2= Y_pred - Y_true dLdW2 = (1/no_examples) * np.dot(dLdZ2, A.T) dLdb2 = (1/no_examples) * np.sum(dLdZ2, axis=1, keepdims=True) dLdZ1 = np.multiply(np.dot(W_B['W2'].T, dLdZ2), (1 - np.power(A, 2))) dLdW1 = (1/no_examples) * np.dot(dLdZ1, X.T) dLdb1 = (1/no_examples) * np.sum(dLdZ1, axis=1, keepdims=True) #Store gradients for gradient descent in a dictionary gradients = {"dLdW1": dLdW1, "dLdb1": dLdb1, "dLdW2": dLdW2, "dLdb2": dLdb2} return gradients, L
许多人总是认为反向传播很困难,但正如您在本教程中看到的那样,事实并非如此。理解每一步对于掌握整个反向传播技术是必不可少的。另外,要掌握数学 - 线性代数和微积分 - 才能理解每个函数的各个梯度是如何计算的。实际上,反向传播通常由您正在使用的深度学习框架为您处理。但是,理解这种技术的内部运作是有益的,因为它有时可以帮助您理解为什么您的神经网络可能无法很好地训练。