在神经网络中实现反向传播

cetrolchen 2018-10-07

在构建神经网络时,需要采取几个步骤。最重要的两个步骤是实现正向和反向传播。在本教程中,我们将重点关注反向传播及其每一步背后的直觉。

什么是反向传播?

这是一种简单的实现神经网络的技术,它允许我们计算参数的梯度,以执行梯度下降和最小化我们的成本函数。我们将在本教程中完全理解反向传播的每个部分。

在神经网络中实现反向传播

实现反向传播

假设一个简单的两层神经网络 - 一个隐藏层和一个输出层。我们可以如下进行反向传播

初始化用于神经网络的权重和偏差:这涉及随机初始化神经网络的权重和偏差。这些参数的梯度将从反向传播中获得并用于更新梯度下降。

Python代码如下:

#Import Numpy library
import numpy as np
#set seed for reproducability 
np.random.seed(100)
#We will first initialize the weights and bias needed and store them in a dictionary called W_B
def initialize(num_f, num_h, num_out):
 
 '''
 Description: This function randomly initializes the weights and biases of each layer of the neural network
 
 Input Arguments:
 num_f - number of training features
 num_h -the number of nodes in the hidden layers
 num_out - the number of nodes in the output 
 
 Output: 
 
 W_B - A dictionary of the initialized parameters.
 
 '''
 
 #randomly initialize weights and biases, and proceed to store in a dictionary
 W_B = {
 'W1': np.random.randn(num_h, num_f),
 'b1': np.zeros((num_h, 1)),
 'W2': np.random.randn(num_out, num_h),
 'b2': np.zeros((num_out, 1))
 }
 return W_B

在神经网络中实现反向传播

执行正向传播:这涉及计算隐藏层和输出层的线性和激活输出。

对于隐藏层:

我们将使用relu激活函数,Python代码如下所示:

#We will now proceed to create functions for each of our activation functions
def relu (Z):
 
 '''
 Description: This function performs the relu activation function on a given number or matrix. 
 
 Input Arguments:
 Z - matrix or integer
 
 Output: 
 
 relu_Z - matrix or integer with relu performed on it
 
 '''
 relu_Z = np.maximum(Z,0)
 
 return relu_Z

在神经网络中实现反向传播

对于输出层:

我们将使用sigmoid激活函数,Python实现如下所示:

def sigmoid (Z):
 
 '''
 Description: This function performs the sigmoid activation function on a given number or matrix. 
 
 Input Arguments:
 Z - matrix or integer
 
 Output: 
 
 sigmoid_Z - matrix or integer with sigmoid performed on it
 
 '''
 sigmoid_Z = 1 / (1 + (np.exp(-Z)))
 
 return sigmoid_Z

在神经网络中实现反向传播

执行正向传播,Python实现如下:

#We will now proceed to perform forward propagation
def forward_propagation(X, W_B): 
 '''
 Description: This function performs the forward propagation in a vectorized form 
 
 Input Arguments:
 X - input training examples
 W_B - initialized weights and biases
 
 Output: 
 
 forward_results - A dictionary containing the linear and activation outputs
 
 '''
 
 #Calculate the linear Z for the hidden layer
 Z1 = np.dot(X, W_B['W1'].T) + W_B['b1']
 
 #Calculate the activation ouput for the hidden layer
 A = relu(Z1)
 
 #Calculate the linear Z for the output layer
 Z2 = np.dot(A, W_B['W2'].T) + W_B['b2']
 
 #Calculate the activation ouput for the ouptu layer
 Y_pred = sigmoid(Z2) 
 
 #Save all ina dictionary 
 forward_results = {"Z1": Z1,
 "A": A,
 "Z2": Z2,
 "Y_pred": Y_pred}
 
 return forward_results

在神经网络中实现反向传播

执行反向传播:

计算成本相对于梯度下降相关参数的梯度。在本例中,dLdZ2、dLdW2、dLdb2、dLdZ1、dLdW1和dLdb1。这些参数将与学习率结合起来进行梯度下降。

逐步指南如下:

  • 从正向传播中获取结果,如下所示:
forward_results = forward_propagation(X, W_B)
Z1 = forward_results['Z1']
A = forward_results['A']
Z2 = forward_results['Z2']
Y_pred = forward_results['Y_pred']
  • 获取训练样本的数量,如下所示:
no_examples = X.shape[1]
  • 计算函数损失:
L = (1/no_examples) * np.sum(-Y_true * np.log(Y_pred) - (1 - Y_true) * np.log(1 - Y_pred))
  • 计算每个参数的梯度,如下所示:
dLdZ2= Y_pred - Y_true
dLdW2 = (1/no_examples) * np.dot(dLdZ2, A.T)
dLdb2 = (1/no_examples) * np.sum(dLdZ2, axis=1, keepdims=True)
dLdZ1 = np.multiply(np.dot(W_B['W2'].T, dLdZ2), (1 - np.power(A, 2)))
dLdW1 = (1/no_examples) * np.dot(dLdZ1, X.T)
dLdb1 = (1/no_examples) * np.sum(dLdZ1, axis=1, keepdims=True)
  • 将梯度下降所需的计算梯度存储在字典中:
gradients = {"dLdW1": dLdW1,
 "dLdb1": dLdb1,
 "dLdW2": dLdW2,
 "dLdb2": dLdb2}
  • 返回损失和存储的梯度:
return gradients, L

这是完整的反向传播函数:

Python代码如下:

def backward_propagation(X, W_B, Y_true):
 '''Description: This function performs the backward propagation in a vectorized form 
 
 Input Arguments:
 X - input training examples
 W_B - initialized weights and biases
 Y_True - the true target values of the training examples
 
 Output: 
 
 gradients - the calculated gradients of each parameter
 L - the loss function
 
 '''
 
 # Obtain the forward results from the forward propagation 
 
 forward_results = forward_propagation(X, W_B)
 Z1 = forward_results['Z1']
 A = forward_results['A']
 Z2 = forward_results['Z2']
 Y_pred = forward_results['Y_pred']
 
 #Obtain the number of training samples 
 no_examples = X.shape[1]
 
 # Calculate loss 
 L = (1/no_examples) * np.sum(-Y_true * np.log(Y_pred) - (1 - Y_true) * np.log(1 - Y_pred))
 
 #Calculate the gradients of each parameter needed for gradient descent 
 dLdZ2= Y_pred - Y_true
 dLdW2 = (1/no_examples) * np.dot(dLdZ2, A.T)
 dLdb2 = (1/no_examples) * np.sum(dLdZ2, axis=1, keepdims=True)
 dLdZ1 = np.multiply(np.dot(W_B['W2'].T, dLdZ2), (1 - np.power(A, 2)))
 dLdW1 = (1/no_examples) * np.dot(dLdZ1, X.T)
 dLdb1 = (1/no_examples) * np.sum(dLdZ1, axis=1, keepdims=True)
 
 #Store gradients for gradient descent in a dictionary 
 gradients = {"dLdW1": dLdW1,
 "dLdb1": dLdb1,
 "dLdW2": dLdW2,
 "dLdb2": dLdb2}
 
 return gradients, L

在神经网络中实现反向传播

许多人总是认为反向传播很困难,但正如您在本教程中看到的那样,事实并非如此。理解每一步对于掌握整个反向传播技术是必不可少的。另外,要掌握数学 - 线性代数和微积分 - 才能理解每个函数的各个梯度是如何计算的。实际上,反向传播通常由您正在使用的深度学习框架为您处理。但是,理解这种技术的内部运作是有益的,因为它有时可以帮助您理解为什么您的神经网络可能无法很好地训练。

相关推荐