利用遗传算法优化机器学习

bamboocqh 2018-10-07

利用遗传算法优化机器学习

通常在机器学习中,我们总是需要优化特征或参数。这就是为什么在一些机器学习技术中使用优化算法变得非常普遍的原因。遗传算法在机器学习的许多领域都得到了应用。我将在本教程中尝试解释什么是遗传算法。我们还将实现遗传算法的一个简单版本,并解释其背后的直觉。然后,我们将尝试理解遗传算法在机器学习中的一些用例。

什么是遗传算法?

遗传算法是一种基于进化理论的优化算法。它的核心是基于一个适应度函数来选择最优的参数或成分,该适应度函数表示它们如何很好地满足所期望的优化问题。然后这些参数发生突变,希望它们能产生更健康的后代。这是遗传算法背后的基本直觉。

从头实现一个遗传算法

一个基本的遗传算法有以下步骤:

  1. 初始化种群
  2. 创建适应度计算函数
  3. 定义交配池
  4. 选择父母
  5. 进行交叉
  6. 变异
  7. 获得后代作为第1步的新种群。
  8. 返回2并重复直到满足停止条件

下面是要解释的简单流程图

利用遗传算法优化机器学习

我们现在将从头开始实施遗传算法。我们将寻求优化简单的线性方程--F(X)= A1X1 + A2X2 + A3X3 + A4X4。其中X1 = 4,X2 = -2,X3 = 3.5,X4 = -4.2

全部随机选择。

遗传算法将帮助我们找出最佳乘数--A值 - 它将产生我们给定代数的最大F(X)。我们创建了一些函数来帮助我们执行适应度计算,父母选择,交叉和变异。

我们将执行10代以上的GA,每代有6个解决方案。

我们的基本遗传算法将按以下步骤完成:

初始化种群:这涉及初始化您的样本解决方案。对于我们的种群解决方案,将初始化它们在-5和+ 5之间(随机选择),如下所示:

#Import numpy library import numpy as np 
#Set seed for reproducability 
np.random.seed(100)

创建适应度计算函数:我们现在将继续循环数代,并从我们的初始解决方案开始计算每一代解决方案的适应度。这个适应度函数将使用我们已经制作的适应度函数来计算,这有助于我们最大化线性方程F(X)。Python代码如下:

#Calcuate fitness of population
def calculate_fitness(inputs, population):
 
 '''
 Description: This function calculates the fitness of our solutions
 
 Input Arguments: input values and a population of weights
 
 Output: Fitness - a fitness value
 
 '''
 
 fitness = np.sum(population*inputs, axis=1)
 return fitness

利用遗传算法优化机器学习

选择父母:我们将继续选择父池,以根据适应度函数生成下一代。这意味着我们将在每一代中选择最适合的父母并使用它来创造下一代的后代。我们将使用choose_mating函数执行此操作。从每一代开始,我们将选择三个最适合的父母,Python代码如下所示:

#Choose parents to mate 
def choose_mating(population, fitness, number_of_parents):
 
 '''
 Description: This function chooses the most fit solutions to mate in order to yield the next generation
 
 Input Arguments: a population, the fitness and the number of parents required to mate
 
 Output: 
 
 the parents to mate
 '''
 
 #initialize empty numpy array to hold parents
 parents = np.empty((number_of_parents, population.shape[1]))
 
 #loop to fill the created parents array with the fittest solutions and stop at the required number of parents
 for number in range(number_of_parents):
 maximum_fitness_index = np.where(fitness == np.max(fitness))
 maximum_fitness_index = maximum_fitness_index[0][0]
 parents[number, :] = population[maximum_fitness_index, :]
 fitness[maximum_fitness_index] = -99999999999
 
 
 return parents

利用遗传算法优化机器学习

交叉:接下来,我们将从上一代选择的父母进行交叉,以创造下一代。我们将使用单点交叉并选择在父母中心交叉的点。这意味着第一个亲本的前半部分基因将是后代的第一个基因,第二个交配亲本的后半部分同理。使用以下Python代码执行此操作:

#Perform cross over
def crossingover(parents, size_of_offspring):
 '''
 Description: This function performs a one point cross over operation on the parents. 
 
 Input Arguments:
 Parents - the parents pool to be crossed over
 size of offspring - the size of offspring required
 
 Output: 
 
 offspring - the crossed over offspring
 '''
 # initialize an empty numpy array to hold the offsprings
 offspring = np.empty(size_of_offspring)
 
 # specify the point of cross over - at the center in our case
 point_of_crossover = np.uint8(size_of_offspring[1]/2)
 
 #loop over the number of offspring required to get the required offsprings
 
 for k in range(size_of_offspring[0]):
 # get index of first parent to be mated
 parent1_index = k%parents.shape[0]
 # get index of second parent to be mated
 parent2_index = (k+1)%parents.shape[0]
 # proceed to assign the first half of the first parents gene to the first half gene of the offspring
 offspring[k, 0:point_of_crossover] = parents[parent1_index, 0:point_of_crossover]
 # proceed to assign the second half of the second parents gene to the second half gene of the offspring
 offspring[k, point_of_crossover:] = parents[parent2_index, point_of_crossover:]
 
 return offspring

利用遗传算法优化机器学习

进行突变:接下来,我们将在后代的基因中引入一些随机变异,以减少父母和后代之间过度的对称。代码如下:

#perform mutation
def mutation(cross_over_offspring):
 
 '''
 Description: This function performs some random variation to the genes of the 
 solutions by adding a random bias number.
 
 Input - 
 cross_over_offspring - offspring obtained from the cross over function
 
 Output -
 Mutated offspring 
 
 '''
 
 #loop over the offsprings and add a random bias number to their genes
 for number in range(cross_over_offspring.shape[0]):
 # Create the random bias number to be used for muation. The arguments taken are the limits(low and high) and the size
 random_bias_number = np.random.uniform(-1.5, 1.5, 1)
 
 #Proceed to mutate by adding the bias number
 
 cross_over_offspring[number, 3] = cross_over_offspring[number, 3] + random_bias_number
 return cross_over_offspring

利用遗传算法优化机器学习

获得最佳解:经过每一代的循环,我们将在最后一代从解决方案中获得最适合的解。这将是我们用遗传算法得到的最适合的解。现在,我们将继续对一个简单的示例函数执行遗传算法,目标是使其最大化。

该等式给出为F(X)= A1X1 + A2X2 + A3X3 + A4X4,其中X1 = 4 X2 = -2 X3 = 3.5 X4 = -4.2全部随机选择。

我们将寻求优化值 - A1,A2,A3和A4,以最大化函数F(X)。我们将从一组X的输入开始,然后指定权重的数量。然后我们指定要交叉的若干代,并在最后选择最佳解决方案。

前面创建的函数将被组合起来创建遗传算法,该算法将帮助我们在指定的几代中生成最大值。Python实现如下:

'''
Implementing the GA
'''
# Specify inputs of the equation ( X values)
inputs = [4,-2,3.5,-4.2]
# Specify the number of weights or multipliers (A values, in our case) for each input which we are looking to optimize
number_of_weights = 4
# Specify the number of solutions in each generation. Each solution will have the number of weights
solutions = 6
#Specify the number of parents mating to yield the specified number of solutions
number_of_parents_mating = 3
# Define the population size based on the number of parents mating and the number of solutions
population_size = (solutions,number_of_weights)
#randomly create the inital population according to the specified population size
new_population = np.random.uniform(low=-5.0, high=5.0, size=population_size)
#Print out the initialized population
print(new_population)
#Specify the number of generations to mutate through 
number_of_generations = 10
#Loop over the specified number of generations
for generation in range(number_of_generations):
 
 #Print the generation in which we are currently looping through to keep track
 print("Generation : ", generation)
 
 # Calculate the fitness of the population in the current generation
 fitness = calculate_fitness(inputs, new_population)
 # Proceed to select the most fit parents to mate
 parents = choose_mating(new_population, fitness, 
 number_of_parents_mating)
 # Proceed to create offsprings via the one point cross over
 crossover = crossingover(parents,
 size_of_offspring=(population_size[0]-parents.shape[0], number_of_weights))
 # Proceed to add random bias to the genes of the created offspring via the mutation function
 mutated = mutation(crossover)
 # Proceed to create the new population from the parents and the mutated
 new_population[0:parents.shape[0], :] = parents
 new_population[parents.shape[0]:, :] = mutated
 # Get the best or most fit solution in this current generation
 print("Best result : ", np.max(np.sum(new_population*inputs, axis=1)))
#Calculate fitness of final generation
fitness = calculate_fitness(inputs, new_population)
# Get most fit solution of final generation
most_fit_index = np.where(fitness == np.max(fitness))
#Print out the best solution 
print("Best solution : ", new_population[most_fit_index, :])
print("Best solution fitness : ", fitness[most_fit_index])

利用遗传算法优化机器学习

遗传算法在机器学习中的应用

  • 特征选择:遗传算法(GA)可用于训练机器学习算法的特征选择。遗传算法的目标是通过选择产生这种结果的最佳特性来优化机器学习模型的性能度量。
  • 超参数调整:利用遗传算法对机器学习算法和神经网络的超参数进行调优,以优化模型性能。这将涉及到使用遗传算法搜索最适合的超参数,以优化模型性能和降低成本函数。
  • 强化学习:强化学习非常适合遗传算法,因为它们都旨在根据以前的算法优化决策。遗传算法在强化学习中有很多应用。

遗传算法是最受欢迎的优化算法之一。如您所见,它在机器学习中也有广泛的应用领域。应用领域仅受您的想象力限制。现在您已经了解了有关遗传算法的所有知识,您可以有效地使用它来优化您的机器学习模型。

相关推荐